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OM nucleic - nucleic search, using sw model 
Run on: 



February 26, 2004, 00:47:23 ; Search time 3410.29 Seconds 

(without alignments) 
17679.337 Million cell updates/sec 



Title: US-09-989-981A-3 
Perfect score: 2019 
Sequence: 

Scoring table: 



1 atggctgagaaaaccaaaga agtcaattcaagactggtga 2019 

IDENTITY_NUC 
Gapop 10.0 , Gapext 1.0 



Searched: 27513289 seqs, 14931090276 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 
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em_estba : * 
em_esthum: * 
em_estin : * 
em_estmu: * 
em_estov: * 
em_estpl : * 
em__estro: * 
em_htc: * 
gb_estl : * 
gb_est2:* 
gb_htc : * 
gb_est3:* 
gb_est4 : * 
gb_est5 : * 
em_estf un : * 
em__estom: * 
em_gss_hum: * 
em_gss_inv: * 
em_gss_pln: * 
em_gss_vrt : * 
em_gss_f un: * 
em_gss_mam: * 
em_gss_mus : * 
em_gss_pro: * 
em_gss_rod: * 
em_gss_phg: * 
em gss_vrl : * 



28: gb_gssl:* 
29: gb_gss2:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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AK004871 3623 bp mRNA linear HTC 20-SEP-2003 

Mus musculus adult male liver cDNA, RIKEN full-length enriched 
library, clone : 1300003C16 product :ATP-BINDING CASSETTE, SUB-FAMILY 
G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus], full insert 
sequence . 
AK004871 

AK004 871. 1 GI: 12836380 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P., 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A. f 
Yamamoto,R., Matsumoto, H. , Sakaguchi, S . , Ikegami,T., Kashiwagi, K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FANTOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 



REFERENCE 6 (bases 1 to 3623) 

AUTHORS Adachi f J., Aizawa,K., Akahira,S., Akimura, T . , Arai,A., Aono,H., 
Arakawa,T., Bono,H., Carninci,P., Fukuda,S., Fukunishi, Y. , 
Furuno,M., Hanagaki,T., Hara,A. , Hayatsu,N., Hiramoto,K., 
Hiraoka,T., Hori,F., Imotani,K., Ishii,Y., Itoh,M., Izawa,M., 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Nishi,K., 
Nomura, K., Numazaki,R., Ohno,M., Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D. , 
Shibata,K., Shibata,Y., Shinagawa,A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A. , Takahashi, F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura, T . , Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu,M. and Hayashizaki, Y. 
TITLE Direct Submission 

JOURNAL Submitted ( 10- JUL-2000) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL : http : //genome . gsc . riken . go . jp/, Tel : 81-45-503-9222 , 
Fax:81-45-503-9216) 

COMMENT Please visit our web site (http://genome.gsc.riken.go.jp/) for 

further details . 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5 1 GAGAGAGAGAGCGGCCGCAACT C GAGT TT T T T T T TT TT TT TTVN 3'], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 
strand cDNA was prepared with the primer adapter of sequence [5 1 
GAGAGAGAGAAGGATCCAAGAGCT CAATTAATT TAATT AAAC CCCCCCCCCC 3 ' ] . cDNA was 
cleaved with Xhol and Sstl. Cloning sites, 5 1 end: SstI; 3 f end: 
Xhol. Host: SOLR. 
FEATURES Location/Qualifiers 
source 1. .3623 

/organism="Mus musculus" 

/mol_type= ,, mRNA" 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB:1300003Cl6" 

/dbxre f — M MGI : 1 8 9 68 57 " 

/db_xref="taxon: 10090" 

/clone="1300003C16" 

/sex="male M 

/tissue_type=" liver" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev_stage=" adult" 
CDS 69. .2090 

/note="unnamed protein product; ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus] 
( SWISSPROT | Q9DBM0, evidence: FASTY, 92%ID, 96.7%length, 
match=1796) 
putative" 
/ codon_start=l 
/protein_id-"BAB23630.1" 
/db xref="GI: 12836381" 



/trans la tion="MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 
SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 
AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 
NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 
RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSD 
IFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKE 
REVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVEL 
PGMI EQFSTLI RRQI SNDFRDLPTLLIHGSEACLMSLI IGFLYYGHGAKQLS FMDTAA 
LLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 
IIYAMPITOLTNLRPVPELFLLHFLLWLVVFCCRTMAIAASAMLPTFHMSSFFCNAL 
YNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWCFS GLMQI QFNGHLYTTQI GN FT FS I 
LGDTMI SAMDLNSHPLYAI YLI VI GI S YGFLFLYYLSLKLI KQKS IQDW " 

polyA_signal 3605. .3610 

/ note="putative M 

polyA_site 3623 

/note="putative " 

ORIGIN 

Query Match 99.4%; Score 2006; DB 11; Length 3623; 

Best Local Similarity 99.9%; Pred. No. 0; 

Matches 2019; Conservative 0; Mismatches 0; Indels 3; Gaps 1; 

AT GGCT GAGAAAAC CAAAGAAGAGACC C AGCT GT GGAAT GGGACT GT ACT T C AGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGT GGAAT GGGACT GTACTT CAGGATGCT 128 

TC GGGCCT C CAGGACAGCT T GT T CT C CT C GGAAAGT GACAACAGT CT GT ACT T CAC C 117 

II I I I I I I M II I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I 
TCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 188 

TACAGT GGTCAGTCCAACACTCTGGAGGTCAGAGAT CT CACCTACCAGGTGGACATCGCC 177 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M 
TACAGT GGT CAGT C CAACACT CT GGAGGT CAGAGAT CT CACCT AC CAGGT GGAC AT CGC C 24 8 

T CT CAGGT GC CTT GGTT T GAGC AGCTGGCT CAGTT CAAGAT ACC CT GGAGGT CT CATAGC 237 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 308 

AGC CAAGACT C CT GT GAGCT GGGC AT CC GAAATCTAAGCTT CAAAGT GAGGAGT GGACAG 2 97 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I 
AGC CAAGACT C CT GT GAGCT GGGC AT C C GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAG 368 

AT GCT G GCC AT CAT AGGGAGCT CAGGCT GC GGGAGAGCCT CACT ACT C GACGT GAT CACA 357 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

AT GCTGGCCAT CATAGGGAGCT CAGGCT GC GGGAGAGCCT CACT ACT C GACGT GAT CACA 428 

GGCAGAGGCCACGGTGGCAAGAT GAAAT CAGGACAAATTTGGATAAAT GGGCAACCCAGT 417 

II I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
GGCAGAGGCCACGGTGGCAAGAT GAAAT CAGGACAAATTT GGATAAATGGGCAACCCAGT 4 88 

ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 477 

I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I 

ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 548 

AAC CT GACC GT CAGAGAGAC C CT GGCT TT CATT GCC C AGAT GC GCCT GCC C AGGACCT T C 537 
I | I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I 
AAC CT GACC GT CAGAGAGAC C CT GGCTT T C ATT GC C C AGAT GC GC CT GCC CAGGACCTT C 608 
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538 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 597 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
609 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 668 

598 T GC GCCAACAC CAGAGT G GGCAACACGT AT GT AC GT GG GGT GT C C GGGGGT GAGC GCC GA 657 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

669 TGCGCCAACAC CAGAGT GGGCAACACGT AT GT AC GTGGGGTGTCCGGGGGTGAGCGCCGA 72 8 

658 C GAGT GAGCAT T GGGGT GCAGCT C CT GT GGAACC C AGGAAT C CT CATT CTGGAT GAACC C 717 

I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
729 CGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCC 788 

718 ACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCC 777 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I 
789 ACT TCTGGCCTC GACAGCTT CACAGCC CACAATCT GGT GACAAC CT T GT CC C GC CT GGC C 848 

77 8 AAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 8 37 

I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I II 

84 9 AAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 908 

838 TTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 8 97 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
909 TTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 968 

898 AT GGT GC AGTACTT CACAT C CAT T GGC CAC CCTTGTCCTC G CTATAGCAAC C CT GC GGAC 957 

I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
969 ATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGAC 1028 

958 T T CTAC GT GGACT T GAC CAGC AT C GAC AGACGCAGCAAAGAAC GGGAGGTGGCCAC CGT G 1017 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
102 9 TT CTACGT GGACTTGACCAGCAT CGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT G 1088 

1018 GAGAAGGCAC AGT CT CT T GC AGC C CT GT T C CTAGAAAAAGT ACAAGGCTTT GAT GACTTT 1077 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I M I I I 
1089 GAGAAGGCACAGT CTCTT GCAGCCCT GTTC CTAGAAAAAGT ACAAGGCTTT GAT GACTTT 1148 

107 8 CT GT GGAAAGCTGAGGCAAAGGAACT CAACACAAGCACCCACACAGTCAGCCTGACCCT C 1137 

I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I II I I I I I I I I I I I I I I II I I I I II I 
1149 CT GT GGAAAGCT GAGGCAAAGGAACT CAACACAAG CAC CCACAC AGT C AGC CT GACCCT C 1208 

1138 ACACAGGACACT GACT GT GGGACT GCT GT T GAGCT GCC CGGGAT GAT AGAGCAGT TT T C C 1197 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
1209 AC ACAGGAC ACTGACT GT GGGACT GCT GT T GAGCT GC C CGGGAT GAT AGAGCAGTTTT C C 1268 

1198 ACC CT GAT C CGT C GT CAGAT T T C CAAT GACT T CC GG GACCT GC C CAC GCT GCTCATT CAT 1257 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 
1269 AC CCT GATC C GT C GT C AGATT T C CAAT GACTT C CGGGACCT GC C CACGCT GCT CATT CAT 132 8 

1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
1329 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1388 

1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
1389 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1448 



Qy 


1 ^7R 


t t r a a t gt r a t p pt gg AT gt C GT CT C CAAAT GTC ACT C GGAGAG GT CAAT GCT GT ACT AT 


1437 




I I | I I I I 1 1 1 1 M 1 1 1 M 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 i 1 1 1 1 




Db 


1449 


T T CAAT GT CAT C CT GGAT GT C GT CT C CAAAT GTCACT C GGAGAGGT CAAT GCT GT ACT AT 


1508 




1 /oo 

X ft O D 


r APPTGGAAGAPGGGPTGTAPAPTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 


1497 




1 M I I 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M II 




Db 


1509 


GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 


1568 


Qy 


_l y o 


m m r r r r r 7\ r p A P T G T G P P T A P G T P A T C A T r TAPGPGATGPPPATCTACTGGCTGACAAAC 


1557 




| | | | | | | | | | | | | | | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 




Db 


1569 


TTGCCGGAGC ACT GT GC CTAC GT CAT CAT CT ACGC GAT GCC CAT CT ACT GGCT GACAAAC 


1628 


Qy 


Ijjo 


PTPPPGPPPGTGPPTGAGPTPTTPPTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 


1617 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


1629 


CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 


1688 


Qy 


1 £1 Q 
iDlO 


tpptppapp appatggppptggptgpptptgppatgctgpppappTTCCACATGTCCTCC 


1677 




| | | | | | | || | | | | | | | | | 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 




Db 


1689 


T GCT GCAGGAC CAT GGCCCT GGCT GCCTCTGC CAT GCT GCCCACCTTCCACATGTCCTCC 


1748 


Qy 


lo / o 


rnrpf-*rnrp r *rp rr >T\ atpppptpT AP A APTPPTTPT APPTT APTGPPGGPTTCATGATAAACTTG 


1737 




| | | | | | | | | | | | | | I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


1749 


TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATA7VACTTG 


1808 


Qy 


1 TOO 

1 / 3o 


p ap a apptptpp AT APTGPPTGP ATGGATPTPPAAGPTGTPGTTPCTCCGGTGGTGCTTC 


1797 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 




Db 


1809 


GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 


1868 


Qy 


1TQO 

x / y o 


rpppppppTP ATPP AP,ATTPAATTTAATGGAPAPPTTTAPAPPACACAAATCGGCJ\ACTTC 


1857 




M 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


1869 


T CG GGGCTGAT GCAGAT T CAATT TAAT GGACAC CTTT ACAC CAC ACAAAT C GGCAACTT C 


1928 


Qy 


1 ft R fi 

x o Do 


appttptppatpptppgagapapGATPtATPAGTGPCATGGACCTGAACTCGCATCCACTC 


1917 




M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


1929 


ACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTC 


1988 


Qy 


1918 


TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 


1977 




l l l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 [ 1 1 1 1 1 1 1 I 1 1 M 1 1 1 1 1 1! 1 1 1 1 1 
1 1 I I 1 1 M M 1 II II 1 1 1 M 1 M 1 1 1 1 1 1 M ! 1 II 1 1 1 1 II 1 1 1 1 1 1 I 1 1 M 1 I ! I I I I I 




Db 


1989 


TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 


2048 


Qy 


1978 


T C CT T GAAGCT C AT CAAACAGAAGTCAAT T CAAGACT GGT GA 2019 




Db 


2049 


1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 
TCCTT GAAGCTCAT CAAACAGAAGT CAATT CAAGACT GGT GA 2090 





RESULT 2 
AK050938 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



AK050938 2417 bp mRNA linear HTC 20-SEP-2003 

Mus musculus 9 days embryo whole body cDNA, RIKEN full-length 
enriched library, clone : D030040P06 product : ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus], full 
insert sequence. 
AK050938 

AK050938.1 GI:26094211 

HTC; CAP trapper. 

Mus musculus (house mouse) 



ORGANISM Mus musculus 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 
REFERENCE 1 

AUTHORS Carninci,P. and Hayashizaki, Y . 
TITLE High-efficiency full-length cDNA cloning 

JOURNAL Meth. Enzymol. 303, 19-44 (1999) 
MEDLINE 99279253 
PUBMED 10349636 
REFERENCE 2 

AUTHORS Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 
TITLE Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new gene 
JOURNAL Genome Res. 10 (10), 1617-1630 (2000) 
MEDLINE 20499374 
PUBMED 11042159 
REFERENCE 3 

AUTHORS Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P. 

Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A. , 
Yamamoto,R., Matsumoto, H . , Sakaguchi, S . , Ikegami,T., Kashiwagi, K. , 
Fujiwake, S. , Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J. 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 

TITLE RIKEN integrated sequence analysis (RISA) system — 384-format 

sequencing pipeline with 384 multicapillary sequencer 

JOURNAL Genome Res. 10 (11), 1757-1771 (2000) 

MEDLINE 20530913 
PUBMED 11076861 
REFERENCE 4 

AUTHORS The RIKEN Genome Exploration Research Group Phase II Team and the 
FAN TOM Consortium. 

TITLE Functional annotation of a full-length mouse cDNA collection 

JOURNAL Nature 409, 685-690 (2001) 
REFERENCE 5 

AUTHORS The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

TITLE Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 
REFERENCE 6 (bases 1 to 2417) 

AUTHORS Ada chi, J., Aizawa,K., Akimura,T., Arakawa,T., Bono,H., Carninci,P. 
Fukuda,S., Furuno,M., Hanagaki,T., Hara,A., Hashizume, W . , 
Hayashida,K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M. , 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K. , Numazaki,R., Ohno,M. , Ohsato,N., 
Okazaki,Y., Saito,R., Saitoh, H., Sakai,C, Sakai,K., Sakazume,N., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A., Takahashi, F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru,A., Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 

TITLE Direct Submission 

JOURNAL Submitted (16- JUL-2001) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 



COMMENT 



FEATURES 

source 



misc feature 



RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken.go.jp, 
URL: http: //genome. gsc.riken. go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL : http : //genome . gsc . riken . go . jp/ 
URL : http : //fantom. gsc . riken . go . jp/ . 

Location/Qualifiers 

1. .2417 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref="FANTOM_DB:D030040P06" 
/db_xref="MGI: 2418860" 
/db_xref="taxon: 10090" 
/clone="D030040P06" 
/tissue_type="whole body" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev_stage="9 days embryo" 
1. .2417 

/ note="ATP-BINDING CASSETTE, SUB-FAMILY G, MEMBER 8 
(STEROLIN-2) homolog [Mus musculus] ( SWISS PROT | Q9DBM0 , 
evidence: FASTY, 92%ID, 96.7%length, match-1796) " 



ORIGIN 



Query Match 84.2%; Score 1700; DB 11; 

Best Local Similarity 100.0%; Pred. No. 0; 
Matches 1700; Conservative 0; Mismatches 0; 



Length 2417; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



320 



184 



CAGGCT GC GGGAGAGC CT CACTACT C GACGT GAT C ACAGGCAGAGGC CACGGT GGCAAGA 379 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I II I I I I I I I M I I I I I I I I I 
CAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGCAGAGGCCACGGTGGCAAGA 243 



QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



380 T GAAATCAGGACAAATTT GGATAAATGGGCAACCCAGTACGCCT CAGCT GGTGAGGAAGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M II I I I II I I I I I I I II I I I I 
244 T GAAAT C AGGACAAAT T T GGATAAAT GGGCAAC CCAGT AC GCCT CAGCT GGT GAGGAAGT 



439 



303 



499 



440 GCGTT GC GCAT GT GC GG CAGC AT GACCAACT GCT GC C CAAC CT GACC GT CAGAGAGACC C 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I 
304 GCGT T GCGCAT GT GC GGCAGCAT GACCAACT GCT GC CCAAC CT GACC GT CAGAGAGACCC 363 

500 T GGCT TT CATT GC CCAGAT GCGC CT GC C CAGGACCT T CT C C CAGGCCCAGC GT GACAAAC 559 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
364 TGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAAC 423 



560 



424 



620 



484 



GGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
GGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCA 



619 



483 



679 



ACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGC 

I I I I I I I || I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I M M I I I I I I I I I 

ACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGC 54 3 



Qy 


680 


Db 


544 


Qy 


740 


Db 


604 


Qy 


800 


Db 


664 


Qy 


860 


Db 


724 


Qy 


920 


Db 


784 


Qy 


980 


Db 


844 


Qy 


1040 


Db 


904 


Qy 


1100 


Db 


964 


Qy 


1160 


Db 


1024 


Qy 


1220 


Db 


1084 


Qy 


1280 


Db 


1144 


Qy 


1340 


Db 


1204 


Qy 


1400 


Db 


1264 


Qy 


1460 


Db 


1324 



TCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCA 739 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

T CCT GT GGAACC CAGGAAT CCT CAT T CTGGAT GAAC C CACT T CT GGC CT C GAC AGCTT CA 603 

CAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCA 799 

| | I I I I I I M I I I I I I I I II I I I I I I I I M I M I I I I I I I I I I I II I I I I I I I I I I I I I I 
CAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCA 663 

T CT C C CT C CAC C AGCCT CG CT CT GAC AT CT T C AGGCTAT TTGAC CT GGT C CTT CT GAT GA 859 

I M I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I M I II I II I II I I I I I I 

TCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGA 723 

CATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCA 919 

I I I I I I I II I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CATCT GGCACCCCTAT CTACCT GGGGGCGGCGCAGCAAAT GGT GCAGTACTTCACAT CCA 783 

TTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCA 979 

| | I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I I I I I I I I 
TTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCA 843 

T C GACAGAC GCAGCAAAGAAC GGGAGGT GGC C ACC GTGGAGAAGGC ACAGTCT CTT GCAG 1039 

I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I 

T C GACAGAC GCAGCAAAGAAC GGGAGGT GGC CAC C GT GGAGAAGGCACAGT CT CTT GCAG 903 

CCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTGTGGAAAGCTGAGGCAAAGG 1099 
| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
C CCT GTT C CT AGAAAAAGTACAAGGCTT T GAT GACT TT CTGT GGAAAGCT GAGGCAAAGG 963 

AACT CAACACAAGCACCCACACAGT CAGCCT GACCCTCACACAGGACACTGACTGTGGGA 1159 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AACT CAACACAAGC ACC C ACACAGT CAGCCT GAC C CT C ACACAGGACACT GACTGT GGGA 1023 

CTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCCACCCTGATCCGTCGTCAGATTT 1219 

I I I I I I I I M I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

CT GCT GT T GAGCT GCCCGGGAT GAT AGAGCAGTT T T CCACC CT GAT CC GT C GT CAGATTT 1083 

CCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGGTCGGAAGCCTGCCTGATGT 1279 

| | | | | | I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGGTCGGAAGCCTGCCTGATGT 1143 

CCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGGACA 1339 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
CCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGGACA 1203 

CAG CAGC CCT C CT CT T C ATGAT AGGGGC GCT CAT T CCT TT CAAT GT CAT CCT GGAT GT C G 1399 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTCAATGTCATCCTGGATGTCG 1263 

T CT C CAAAT GT CACT C GGAGAGGT CAAT GCT GT ACT AT GAGCT GGAAGAC GGGCT GT AC A 1459 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I 

T CT C CAAAT GT CACT CGGAGAGGT CAAT GCT GT ACT AT GAGCT GGAAGACGGGCT GT ACA 1323 

CTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTGCCGGAGCACTGTGCCTACG 1519 

| | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I M 
CTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTGCCGGAGCACTGTGCCTACG 1383 





1520 


TCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCT 


1579 




M | I | I I 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 I 1 1 1 1 1 1 1 




Db 


1384 


TCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCT 


1443 




J. J o w 


TCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGG 


1639 




M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1444 


TCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGG 


1503 


Qy 




r T VGrr T TC r V(^rr ATGrTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACA 


1699 




I | | || | | | I I I I I I I 1 II 1 1 1 II II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 




Db 


1504 


CTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACA 


1563 


Qy 


1 7 n n 


flrTrrTTrTArfTTArTfirrflGirTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTG 


1759 




1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 M 1 1 1 II 1 1 




Db 


1564 


ACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTG 


1623 


Qy 


1 / DU 


raTrr ATrTrr AAr^rTr^TrnTTrrTrfGGTGGTGCTTCTCGGGGCTGATGCAGATTCAAT 


1819 




I | II 1 1 II 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 II II 1 1 1 M 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 




Db 


1624 


CATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAAT 


1683 


Qy 


i pon 
loZU 


tt a ATrrarar rTTTAPArr AT ATA A ATrGGCAACTTCACCTTCTCCATCCTCGGAGACA 


1879 




| | M | | | | | | | | | | | | | | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


1684 


TTAAT GGACACCTTTACACCACACAAAT CGGCAACTT CACCTTCT CCAT CCT CGGAGACA 


1743 


Qy 


i ft ft n 


rrnTr att ACTnrrAT^nArrTQAArTrGrATCCACTCTATGCGATCTACCTCATTGTCA 


1939 




| | | | | I I 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 




Db 


1744 


C GAT GAT C AGTG C CAT GGAC CT GAACTC GCAT CCACT CT AT G CGAT CT AC CT CATT GT CA 


1803 


Qy 


1940 


T C GGC AT C AGCT AC GGCTTC CT GT T C CT GT ACT AT CT ATC CT T GAAGCT CAT CAAAC AGA 


1999 




1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

| 1 1 | I I I 1 M 1 M M 1 1 1 1 II 1 M 1 1 1 1 1 > N II II 1 1 1 1 1 M 1 1 1 i i i i M i i i i i i i i 




Db 


1804 


T C GGCAT C AGCT ACGGCT T C CT GT T C CT GTACT ATCT AT C CT T GAAGCT CAT CAAACAGA 


1863 


Qy 


2000 


AGT CAATT CAAGACT GGT GA 2019 




Db 


1864 


1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 II 1 

AGT C AAT T CAAGACT GGT GA 188 3 





RESULT 3 
BI330745 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



BI330745 849 bp mRNA linear EST 30-JUL-2001 

602982409F1 NCI__CGAP_Li9 Mus musculus cDNA clone IMAGE: 5135115 5 1 , 
mRNA sequence. 
BI330745 

BI330745. 1 GI: 15015402 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 849) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 
Email : cgapbs-r @mail . nih . gov 
Tissue Procurement: Jeffrey E. Green, M.D. 
cDNA Library Preparation: Life Technologies, Inc. 



cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 
Plate: LLAM11332 row: a column: 04 
High quality sequence stop: 758. 
FEATURES Location/Qualifiers 
source 1. .849 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/strain="FVB/N" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 5135115" 

/lab_host="DH10B (Tl phage-resistant) " 
/clone_lib="NCI_CGAP_Li9" 

/note="Organ: liver; Vector: pCMV-SPORT6; Site_l: Notl; 
Site_2: Sail; Cloned unidirectionally . Primer: Oligo dT. 
Average insert size 1.9 kb. Constructed by Life 
Technologies. Note: this is a NCI_CGAP Library." 

ORIGIN 

Query Match 35.4%; Score 714.2; DB 12; Length 849; 

Best Local Similarity 95.9%; Pred. No. 1.9e-163; 

Matches 799; Conservative 0; Mismatches 23; Indels 11; Gaps 6; 

8 91 GCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCC 950 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I 
1 GCAGCAAAT GGT GCAGTACTT CACAT CCATT GGCCACCCTT GT CCT CGCTATAGCAACCC 60 

951 T GC GGACT T CT ACGT GGACTT GAC C AGCAT CGAC AGACGCAGCAAAGAAC GGGAGGT GGC 1010 
I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
61 T GC GGACT T CT ACGT GGACTT GAC CAGC AT C GACAGAC GCAGCAAAGAAC GGGAGGT GGC 120 

1011 CACCGTGGAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGA 1070 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 CAC CGT GGAGAAGGCACAGT CT CT T GCAGCC CT GT T C CTAGAAAAAGT ACAAGGCT T T GA 180 

1071 T GACT TT CTGT GGAAAGCTGAGGCAAAGGAACTCAACACAAGC AC C CACAC AGT CAGCCT 1130 
I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 T GACTTT CT GT GGAAAGCT GAG GCAAAGGAACT CAAC ACAAGCACC CACACAGTCAGC CT 240 

1131 GAC CCT CACACAGGACACT GACT GT GGGACT GCT GT T GAGCT GCC C GGGAT GAT AGAGCA 1190 
I I I I I I I I I II I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
241 GAC CCT CACACAGGACACT GACT GT GGGACT GCT GTT GAGCT GC CC GGGAT GAT AGAGCA 300 

1191 GTTTTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCT 1250 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 
301 GTTTTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCT 360 

1251 CATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCA 1310 

II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M I 

361 CATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCA 420 

1311 TGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCT 1370 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II 
421 TGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCT 4 80 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 1371 CAT T C CTT T CAAT GT CAT C CT GGAT GT C GT CT C CAAAT GT CACT C GGAGAGGT CAAT GC T 1430 

| | | | M M I I I I I I I I I I M I I I I I I I I I I I I M I I I I II I I I I I I I II I I I I I I I I I I I 
Db 4 81 CAT T C CTT T CAAT GT CAT C CT GGAT GT C GT CT C CAAAT GT CACT CG GAGAGGT CAAT GCT 540 

Qy 1431 GTACTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCT 1490 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db * 541 GTACT AT GAGCT GGAAGAC G GG CT GT — ACTGCTGGTCCTTATTTCTTTGCCAAGATCCT 598 

Qy 14 91 AGGAGAAT T GC CGGAGCAC - T GT GCCTAC GT CAT CAT CT ACGC GAT GCC CAT CT ACT GGC 154 9 

I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I II I I I 
Db 599 AGGAGAAT T GC C GGAGCACTT GT GCCT ACGT C ATCATCT ACGC GAT GC C CAT CT ACT GGC 658 

Qy 1550 TGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTAC — ACTTCCTGCTCGTGTGGTT 1607 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I 

Db 659 TGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACCACTTTCCTGCTCGTGTGGTA 718 

Qy 1608 GGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCA 1667 

I I I I I I I I I I I I II I II I I I II I I I I I I I I I I I I I I I I M 

Db 719 GGAGGT CTT CTGCT GCAGGACAT GGCCTT GG TGCTCTGCCATGCTG-CCAACTTCCA 774 

Qy 1668 CATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCG 1720 

I I I I I I I I I I I I I I I II I I I IN I I I I I I I I I I I I I I III 

Db 775 CATGTCCTCCTTCTTCTGCA — TGCCTCTTAGAATCCTTCTACCTTATGGCGG 825 



RESULT 4 
BF660076 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus . 



BF660076 549 bp mRNA linear EST 20-DEC-2000 

maa27c08.yl NCI_CGAP_Li 1 0 Mus musculus cDNA clone IMAGE: 3812342 5 f 
similar to TR: Q9VQN4 Q9VQN4 CG9664 PROTEIN. ;, mRNA sequence. 
BF660076 

BF660076.1 GI: 11925210 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 549) 

NCI-CGAP http : //www . ncbi . nlm . nih . gov/ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 

Tumor Gene Index 

Unpublished (1997) 

Other_ESTs: maa27c08 . xl 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih. gov 

Tissue Procurement: Jeffrey E. Green, M.D. 
cDNA Library Preparation: Life Technologies, Inc. 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Washington University Genome Sequencing Center 
Clone distribution: NCI-CGAP clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : / /image . llnl . gov 



MGI: 1454454 

Seq primer: -40RP from Gibco 
High quality sequence stop: 435. 



FEATURES Location/Qualifiers 
source 1. .549 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/ cl one= " IMAGE :3812342" 
/ sex="f emale" 
/dev_stage="10 weeks" 

/lab_host="DH10B (Tl phage-resistant) " 
/ clone_lib="NCI_CGAP_Li 10" 

/note="Organ: liver; Vector: pCMV-SPORT6; Site_l: NotI; 
Site 2: Sail; Cloned unidirectionally . Primer: Oligo dT. 
Average insert size 1.6 kb . Library constructed by Life 
Technologies . " 

ORIGIN 

Query Match 27.2%; Score 549; DB 10; Length 549; 

Best Local Similarity 100.0%; Pred. No. 3.9e-123; 

Matches 549; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1462 GCT GGT C CT TAT T T CT T TGC CAAGAT C CT AGGAGAATT GC C GGAGC ACTGT GC CT AC GT C 1521 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I 

D b 1 GCT GGT C CT T AT TT CT T TGC CAAGAT C CT AGGAGAAT T GC C GGAGCACTGT GCCT AC GTC 60 



Qy 1522 ATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTC 1581 

I | || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I 
Db 61 ATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTC 120 

Qy 1582 CTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCT 1641 

I | | || | I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I M I I I I I I I I I II I I I I I I I I 
Db 121 CTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCT 180 

Qy 1642 GCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAAC 1701 

| | M I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I II I I I I I I I I I I I 
Db 181 GCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACT^AC 24 0 

Qy 1702 TCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTGCA 1761 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I II I I I 
Db 241 TCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTGCA 300 

Qy 1762 TGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTT 1821 

|| | | | | I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I II I I I I II I II 
Db 301 TGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTT 360 

Qy 1822 AATGGACACCTTTACACCACACA7UVTCGGCAACTTCACCTTCTCCATCCTCGGAGACACG 1881 

| | I I I I I I I I II I I I I I I I I I I I II I II I M I I I I I I I I I I I I I II I I I I I I I I I I I I M 
Db 361 AATGGACACCTTTACACCACACAAAT CGGCAACTT CACCTT CT CCATCCTCGGAGACACG 420 

Qy 18 82 AT GAT CAGT GC CAT GGACCTGAACT C GC AT CCACT CT AT GC GAT CT AC CT CATT GT CAT C 1941 

| M I II I I I I I I M I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I II I II I I I I I I 
Db 421 AT GAT CAGT GC CAT GGAC CT GAACT C GC AT C CACT CT AT GC GAT CT AC CT CATT GT CAT C 480 

Qy 1942 G GC AT CAGCT AC GGCTT C CT GT TC CT GT ACT AT CT AT C CTT GAAGCT CAT CAAAC AGAAG 2 001 

I | | | | I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 481 GGC AT CAGCT AC GGCTT C CT GT T C CT GT ACT AT CT AT C CTT GAAGCT CAT CAAACAGAAG 540 



Qy 



2002 TCAATTCAA 2010 



1 1 1 1 1 1 1 1 1 

Db 541 TCAATTCAA 549 



RESULT 5 
BY705076 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY705076 583 bp mRNA linear EST 16-DEC-2002 

BY705076 RIKEN full-length enriched, adult male liver Mus musculus 
cDNA clone 1300003C16 5*, mRNA sequence. 
BY705076 

BY705076.1 GI: 27116215 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 583) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H. f Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A. , 
Schonbach,C. , Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J. , Schriml, L.M. , Kanapin,A. , Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V. , 
Chothia,C, Corbani,L.E. , Cousins, S., Dalla,E., Dragani, T. A. , 
Fletcher, C.F. , Forrest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A. , Gough,J., Grimmond, S . , 
Gustincich, S. , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A. , 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A., 
Kurochkin, I.V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L . , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius, J. U. , Qi, D. , Ramachandran, S . , 
Ravasi,T., Reed, J. C. , Reed, D . J. , Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider , C . , Semple,C.A., Setou,M., Shimada,K., 
Sultana, R. , Takenaka,Y., Taylor, M.S., Teasdale, R. D. , Tomita,M., 
Verardo,R., Wagner, L . , Wahlestedt, C. , Wang,Y., Watanabe,Y., 
Wells, C, Wilming,L.G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zimmer,A., Carninci,P., 
Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa,A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki, Y. 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome- r es @gs c . r i ken .go.jp, 



URL : http : //genome . gsc . riken . go . jp/ 

Adachi,J., Aizawa,K., Akimura / T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., 
Imotani,K., Ishii,Y., Itoh,M. , Kagawa,I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A., Murata,M., Nakamura,M., 
Nomura, K., Numazaki,R., Ohno,M. , Ohsato,N., Saito,R., Sakazume,N., 
Sano,H w Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki, A. , Muramatsu, M. and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .583 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="1300003C16" 
/sex="male" 
/tissue_type=" liver" 
/dev_stage="adult" 

/clone_lib= H RIKEN full-length enriched, adult male liver" 

ORIGIN 

Query Match 24.7%; Score 498; DB 13; Length 583; 

Best Local Similarity 99.4%; Pred. No. 1.2e-110; 

Matches 511; Conservative 0; Mismatches 0; Indels 3; Gaps 1; 

Qy 1 ATGGCT GAGAAAACCAAAGAAGAGACCCAGCTGT GGAATGGGACTGTACTT CAGGATGCT 60 

I I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 69 AT GGCT GAGAAAAC CAAAGAAGAGACC C AG CT GT GGAAT GGGACT GT ACTT CAGGAT GCT 128 

Qy 61 TC GGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 117 

II I || I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 129 TCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 188 

Qy 118 TAC AGT GGT C AGT C CAACACT CT G GAGGTCAGAGAT CT CAC CT AC CAGGT GGACAT C GC C 177 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 189 TAC AGT GGT C AGT C CAACACT CT GGAGGT CAGAGAT CT CAC CT AC CAGGT GGACATC GCC 248 



Qy 



178 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 237 



1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 

Db 24 9 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 308 

Qy 238 AGC CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CT AAG CT T CAAAGT GAG GAGT GGAC AG 297 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 309 AGC CAAGACT C CT GT GAGCT GGGCAT CCGAAAT CTAAGCTT CAAAGT GAGGAGTGGACAG 368 

Qy 2 98 ATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACA 357 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 369 AT GCTGGCCAT CATAGGGAGCT CAGGCTGCGGGAGAGC CT CACTACTCGACGTGATCACA 428 

Qy 358 GGCAGAGG C C AC GGT G GCAAGAT GAAAT C AGGACAAAT T T GGATAAAT GGGCAAC C C AGT 417 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 42 9 GGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGT 488 

Qy 418 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 477 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 489 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 548 

Qy 47 8 AAC CT GAC C GT CAGAGAGAC CCTGGCTTT CATTG 511 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 549 AAC CTGAC C GT CAGAGAGAC CCTGGCT T T CAT T G 582 



RESULT 6 
AA537862 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 



AA537862 463 bp mRNA linear EST 29-JUL-1997 

vj35a03.rl Stratagene mouse diaphragm (#937303) Mus musculus cDNA 
clone IMAGE: 930988 5*, mRNA sequence. 
AA537862 

AA537862. 1 GI: 2283855 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 463) 

Marra,M., Hillier,L., Allen, M. , Bowles,M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M. , Martin, J., Morris, M., 
Schellenberg,K., Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 535908 

Seq primer: -28ml3 revl ET from Amersham 
High quality sequence stop: 393. 
Location/Qualifiers 



source 1. .463 

/organism="Mus musculus" 
/mol_type= M mRNA" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 930988" 
/tissue_t'ype="diaphragm" 
/de v_s t age= " adul t " 

/lab_host="SOLR (kanamycin resistant)" 
/clone_lib="Stratagene mouse diaphragm (#937303) 11 
/note="Organ: diaphragm; Vector: pBluescript SK-; Site_l: 
EcoRI; Site_2: Xhol; Cloned unidirectionally from mRNA 
prepared from diaphragm muscle. Primer: Oligo dT. Average 
insert size: 1.5 kb. Uni-ZAP XR Vector; ~5' adaptor 
sequence: 5 ! GAAT T C GGCAC GAG 3 1 ~3 ! adaptor sequence: 5 f 
CTCGAGTTTTTTTTTTTTTTTTTT 3 ' 



ORIGIN 

Query Match 22.2%; Score 448.2; DB 9; Length 463; 

Best Local Similarity 98.3%; Pred. No. 1.7e-98; 

Matches 4 53; Conservative 0; Mismatches 8; Indels 0; Gaps 0; 

GAGAAGGCACAGT CT CT T GCAGC C CT GTT C CT AGAAAAAGTACAAGGCTTT GAT GACT TT 1077 
I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml I I I I I I I I I I I I I I > I I I I I I I I I I I I > I I 
GAGAAGGCACAGT CTCTT GCAGCCCT GTT CCT AGAAAAAGTACAAGGCTTT GAT GACTTT 60 

CT GT GGAAAGCT GAGGCAAAGGAACT CAACACAAGCACCCACACAGT CAGCCTGACCCTC 1137 

I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CTGT GGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGT CAGCCT GACCCT C 120 

ACACAGGACACT GACT GT GGGACT GCT GT T GAG CT GCCC GGGAT GATAGAGC AGTTTT CC 1197 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
ACACAGGACACT GACT GT GGGACT GCT GTT GAGCT GC C C GGGAT GAT AGAGCAGTT T TC C 180 

AC CCT GAT CCGT C GT CAGATT T CCAAT GACTT C CG GGAC CT GCCCACGCT GCT CATT CAT 1257 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I 
ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 240 

GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCA 300 

7VAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 360 

T T CAATGT CAT C CT GGAT GTC GT CT C CAAAT GT CACT C G GAGAGGT CAAT GCT GT ACT AT 14 37 
I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I 
T T CAAT GT CAT C CT GGAT GT C GTCT C CAAAT GT C ACTC GGAGAGCT CAAT GCT GT ACT AT 42 0 

GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTT 147 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GAGCT GGAAGACGGGCT GTACACT GCCAATACAT ATTT CTT 461 



Ov 


1018 


Db 


1 


Qy 


1078 


Db 


61 


Qy 


1138 


Db 


121 


Qy 


1198 


Db 


181 


Qy 


1258 


Db 


241 


Qy 


1318 


Db 


301 


Qy 


1378 


Db 


361 


Qy 


1438 


Db 


421 



RESULT 7 
BB610072 

LOCUS BB610072 



510 bp mRNA linear EST 26-OCT-2001 



DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BB610072 RIKEN full-length enriched, adult male liver Mus musculus 

cDNA clone 1300007N20 5', mRNA sequence. 

BB610072 

BB610072. 1 GI: 16451685 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 510) 

Arakawa,T., Carninci,P., Fukuda,S., Furuno,M., Hanagaki,T., 
Hara,A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., 
Konno,H., Kouda,M., Koya,S., Matsuyama , T . , Miyazaki,A., Nomura,K., 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Suzuki, H . , Tagami,M., Tagawa,A., Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Arakawa,T., et al . 2001) 
Unpublished (2001) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc. riken.go. jp, 
URL:http: //genome. gsc. riken. go. jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res . . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y . , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa,H., Yamanaka,I., 
Aizawa,K., Fukuda,S., Hara,A., Itoh,M., Kawai,J., Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences. Martim. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 

Location/Qualifiers 
1. .510 

/organism="Mus musculus" 
/mol_type="mRNA" 



/strain= M C57BL/6J" 

/db_xref=="taxon: 10090" 

/clone= ,, 1300007N20 ,, 

/sex="male M 

/tissue_type="liver" 

/dev_stage="adult" 

/clone lib="RIKEN full-length enriched, adult male liver 1 



ORIGIN 



Query Match 22.1%; Score 446; DB 10; Length 510; 

Best Local Similarity 100.0%; Pred. No. 6.1e-98; 

Matches 446; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1 ATGGCT GAGAAAACCAAAGAAGAGACCCAGCT GTGGAAT GGGACTGTACTT CAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

64 AT G GCT GAGAAAACCAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GTACTT CAGGAT GCT 123 

61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCT GTACTT CACCTAC 120 

I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

124 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCT GTACTT CACCTAC 183 



121 



180 



AGTGGT CAGTCCAACACTCTGGAGGTCAGAGAT CT CACCTACCAGGT GGACAT CGCCTCT 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

184 AGT GGT CAGTC CAAC ACT CT GGAGGT CAGAGAT CT CAC CT AC CAGGT GGACAT CGCCTCT 243 

181 CAGGT GCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 240 

I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
244 CAGGT GCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCT GGAGGT CT CAT AGCAGC 303 



241 



304 



CAAGACTCCTGT GAGCTGGGCAT CCGAAAT CTAAGCTTCAAAGTGAGGAGTGGACAGAT G 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CAAGACTCCTGT GAGCTGGGCAT CCGAAAT CTAAGCTT CAAAGTGAGGAGTGGACAGATG 



301 



300 



363 



360 



CTGGCCAT CAT AGGGAGCT CAGGCTGCGGGAGAGCCTCACTACTCGACGT GAT CACAGGC 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
364 CT GGC CAT CAT AGGGAGCT CAGGCT GCGGGAGAGC CTC ACT ACTCGAC GT GAT CACAGGC 423 

361 AGAGGCCACGGT GGCAAGATGAAAT CAGGACAAATTT GGATAAAT GGGCAACC C AGT AC G 420 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I 
424 AGAGGCC AC GGT GGCAAGAT GAAAT CAGGACAAAT TTGGATAAAT GGGCAAC C CAGT AC G 4 83 



Qy 



Db 



421 CCT CAGCT GGT GAG GAAGTGCGTTGC 446 

I I I I I I I I I I I I II I I I I I I I I I I I I 
484 CCT CAGCT GGT GAGGAAGT GCGTT GC 509 



RESULT 8 
AI157365 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AI157365 511 bp mRNA linear EST 30-SEP-1998 

ui45h01.yl Sugano mouse embryo mewa Mus musculus cDNA clone 
IMAGE: 1885393 5', mRNA sequence. 
AI157365 

AI 157365.1 GI: 3685834 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 
REFERENCE 1 (bases 1 to 511) 

AUTHORS Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 

Geisel,S., Kucaba,T., Lacy, M . , Le,M., Martin, J., Morris, M., 

Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 

Theising,B., Wylie,T\, Lennon,G., Soares,B., Wilson, R. and 

Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 . 

Fax: 314 286 1810 

Email: mouseest@watson . wustl . edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 969717 

Seq primer: custom primer used 
High quality sequence stop: 480. 
FEATURES Location/Qualifiers 
source 1. .511 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL" 
/db_xref="taxon: 10090" 
/clone=" IMAGE: 1885393" 
/dev_stage="embryo, 14 dpc" 
/lab_host="DH10B" 

/clone_lib="Sugano mouse embryo mewa" 

/note="Vector: pME18S-FL3; Site_l: Drain (CACTGTGTG); 
Site_2: Drain (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Dralll adaptor 
[TGTTGGCCTACTGG] , digested and cloned into distinct Dralll 
sites of the pME18S-FL3 vector (5 ! site CACTGTGTG, 3 1 site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 
(University of Tokyo Institute of Medical Science) . 
Custom primers for sequencing: 5' end primer 
CTTCTGCTCTAAAAGCTGCG and 3 1 end primer 
CGACCT GCAGCT CGAGCACA . " 

ORIGIN 

Query Match 21.9%; Score 442; DB 9; Length 511; 

Best Local Similarity 99.3%; Pred. No. 5.9e-97; 

Matches 455; Conservative 0; Mismatches 0; Indels 3; Gaps 1; 

Qy 1 AT GGCT GAGAAAAC CAAAGAAGAGACCC AGCT GT GGAAT GGGACT GTACT T CAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 54 AT GGCT GAGAAAAC CAAAGAAGAGAC CCAGCT GT GGAAT GGGACT GTACTT CAGGAT GCT 113 

Qy 61 TC GGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 117 

II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 114 TCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 173 



Qy 118 TACAGTGGT CAGTCCAACACT CTGGAGGT CAGAGAT CTCACCTACCAGGT GGACAT CGCC 177 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 174 T AC AGT GGT CAGT CCAAC ACT CT GGAGGT CAGAGAT CTCACCT AC CAG GT GGACAT CGCC 233 

Qy 178 T CT CAGGT GCCTT GGT TTGAGCAGCTGGCT CAGT TCAAGATACCCT GGAGGT CTC AT AG C 237 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 234 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 293 

Qy 238 AGC CAAGACT C CTGT GAGCT GGG CAT C CGAAAT CT AAGCTT CAAAGT GAGGAGT GGAC AG 297 

I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I M I I I I 
Db 294 AGC CAAGACT C CT GT GAGCT GGGC AT CCGAAAT CTAAGCT T CAAAGT GAGGAGT GGAC AG 353 

Qy 298 AT GCTGGCCAT CATAGGGAGCTCAGGCTGCGGGAGAGCCT CACTACT CGACGTGATCACA 357 

I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 354 AT GCTGGCCAT CAT AGGGAGCTCAGGCTGCGGGAGAGCCTCACTACT CGACGTGATCACA 413 

Qy 358 GGC AGAGGC C AC GGT GGCAAGAT GAAAT CAGGACAAATT T GGAT AAAT GGGCAAC CCAGT 417 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 414 GGC AGAGGC CACGGT GGCAAGAT GAAAT CAGGACAAATTT GGAT AAAT GGGCAACCCAGT 473 

Qy 418 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCG 455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 474 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCG 511 



RESULT 9 
AI151811 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



AI151811 500 bp mRNA linear EST 30-SEP-1998 

ui4 6cl0.yl Sugano mouse embryo mewa Mus musculus cDNA clone 
IMAGE: 1885458 5 1 , mRNA sequence. 
AI151811 

AI 1518 11.1 GI: 368 02 80 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 500) 

Marra,M., Hillier,L., Allen, M., Bowles, M. , Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M. , Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M. , Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 969782 

Seq primer: custom primer used 



High quality sequence stop: 499. 
FEATURES Location/Qualifiers 
source 1. .500 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL" 
/db_xref="taxon: 10090" 
/clone-" IMAGE: 1885458" 
/dev_stage=" embryo, 14 dpc" 
/lab_host="DH10B" 

/clone_lib="Sugano mouse embryo mewa" 

/note="Vector: pME18S-FL3; Site_l: Dralll (CACTGTGTG) ; 
Site_2: Dralll (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Dralll adaptor 

[TGTTGGCCTACTGG] , digested and cloned into distinct Dralll 
sites of the pME18S-FL3 vector (5 1 site CACTGTGTG, 3 1 site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 

(University of Tokyo Institute of Medical Science) . 
Custom primers for sequencing: 5 1 end primer 
CTTCTGCTCTAAAAGCTGCG and 3 ' end primer 
C GAC CT GCAGCT C GAGCACA . " 

ORIGIN 

Query Match 20.9%; Score 422.4; DB 9; Length 500; 

Best Local Similarity 99.1%; Pred. No. 3.6e-92; 

Matches 436; Conservative 0; Mismatches 1; Indels 3; Gaps 1; 

1 AT GGCT GAGAAAAC CAAAGAAGAGAC CC AGCT GT GGAAT GGGACT GT ACTT CAGGAT GCT 60 
I | | | | | | | I I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
61 AT GGCT GAGAAAAC CAAAGAAGAGACCCAGCTGT GGAAT GGGACTGTACTTCAGGAT GCT 12 0 

61 TC GGGC CT C CAGGACAGCT TGTT CT C CT C GGAAAGT GACAACAGT CT GT ACTT CACC 117 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I 
121 T C GCAGG GCCT C CAGGACAGCTT GT T CT C CT C GGAAAGT GACAACAGT CT GT ACTT CACC 180 

118 T ACAGT GGT CAGT C CAAC ACT CT GGAGGT C AGAGAT CT CAC CT ACC AGGT GGACAT CGCC 177 

I I I I I I I M I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
181 TACAGTGGT CAGT C CAAC ACT CT GGAGGT CAGAGATCTCACCTACCAGGT GGACATCGCC 240 

178 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 237 

I I I I I I I I I I I II I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
241 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 300 

238 AGCCAAGACT CCT GT GAGCTGGGCATCC GAAATCTAAGCTTCAAAGT GAGGAGTGGACAG 297 

I I I I I I I I I II I I I I II I I I II II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I 
301 AGCCAAGACT C CT GT GAGCT GGGCATCCGAAATCTAAGCTT CAAAGT GAGGAGTGGACAG 360 

298 ATGCT GGCCAT CATAGGGAGCT CAGGCT GCGGGAGAGCCTCACTACT CGACGT GAT CACA 357 

I I || I I I I I I I II I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I M I I I M M I I I I 
361 AT GCT GGC CAT CATAGGGAGCT CAG GCT GCG GGAGAGC CT CAC TACT C GACGT GAT CACA 420 

358 GGCAGAGG C CAC GGT GGCAAGAT GAAAT CAGGACAAAT T T GGATAAAT GGGCAAC C CAGT 417 

M I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
421 GGCAGAGGCCACGGTGGCAAGAT GAAAT CAGGACAAATTT GGATAAAT GGGCAACCCAGT 480 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 418 ACGCCTCAGCTGGTGAGGAA 437 

I I I I I I I I I I I I I I I I III 
Db 4 81 ACGCCTCAGCT GGT GAAGAA 500 



RESULT 10 
AI597406 

LOCUS AI597406 398 bp mRNA linear EST 21-APR-1999 

DEFINITION vj35a03.yl Stratagene mouse diaphragm (#937303) Mus musculus cDNA 

clone IMAGE:930988 5 1 , mRNA sequence. 
ACCESSION AI597406 
VERSION AI597406.1 GI:4606454 

KEYWORDS EST . 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 398) 

AUTHORS Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 
Underwood, K. , Steptoe,M. , Theising,B., Allen, M. , Bowers, Y. , 
Person, B., Swaller,T., Gibbons, M. , Pape,D., Harvey, N . , Schurk,R., 
Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M. , McCann,R., 
Waterston,R. and Wilson, R. 
TITLE The WashU-NCI Mouse EST Project 1999 

JOURNAL Unpublished (1999) 
COMMENT Contact: Marra M/WashU-NCI Mouse EST Project 1999 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 
Tel: 314 286 1800 
Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 535908 

This read is a RESEQUENCE of a previously sequenced mouse clone 
This read has been verified (found to hit its original self in the 
correct orientation) 
Seq primer: -40RP from Gibco 
High quality sequence stop: 374. 
FEATURES Location/Qualifiers 
source 1. .398 

/organism="Mus musculus" 

/mol_type= "mRNA" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 930988" 

/ tissue_type="diaphragm" 

/dev_stage="adult" 

/lab__host="SOLR (kanamycin resistant)" 
/clone lib="Stratagene mouse diaphragm (#937303)" 
/note="0rgan: diaphragm; Vector: pBluescript SK-; Site_l: 
EcoRI; Site_2: Xhol; Cloned unidirectionally from mRNA 
prepared from diaphragm muscle. Primer: Oligo dT. Average 
insert size: 1.5 kb. Uni-ZAP XR Vector; ~5 f adaptor 
sequence: 5 ! GAATT C GGC AC GAG 3 1 ~3* adaptor sequence: 5' 
CTCGAGTTTTTTTTTTTTTTTTTT 3 ? " 

ORIGIN 



Query Match 19.7%; Score 398; DB 9; Length 398; 

Best Local Similarity 100.0%; Pred. No. 2.9e-86; 

Matches 398; Conservative 0; Mismatches 0; Indels 0; Gaps 



0; 



Qy 1018 GAGAAGGCACAGT CTCTT GCAGCCCT GTT CCTAGAAAAAGTACAAGGCTTTGAT GACTTT 1077 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GAGAAGGCACAGT CT CTT GCAGCCCT GTT CCTAGAAAAAGTACAAGGCTTTGAT GACTTT 60 

Qy 1078 CT GT GGAAAGCT GAGGCAAAGGAACT CAAC ACAAGC AC C CACACAGT CAGCCT GAC C CT C 1137 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CT GT GGAAAGCT GAGGCAAAGGAACT CAAC ACAAG C AC CCACACAGT CAGCCT GAC C CT C 120 

Qy 1138 ACACAGGACACT GACT GT GGGACTGCTGTT GAGCT GCCCGGGATGATAGAGCAGTTTTCC 1197 

I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 121 ACACAGGACACT GACTGT GGGACTGCTGTT GAGCT GCCCGGGATGATAGAGCAGTTTTCC 180 

Qy 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

I I I I I I M I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 ACCCT GATCCGT CGT CAGATTT CCAATGACTT CCGGGACCTGCCCACGCT GCT CATT CAT 240 

Qy 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I 
Db 241 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 300 

Qy 1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 360 

Qy 137 8 TT CAATGTCAT CCTGGAT GTCGTCT CCAAAT GTCACTC 1415 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 TT CAAT GT CAT C CT GGAT GT C GT CT C CAAAT GTCACT C 398 



RESULT 11 

AK008188 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 



AK008188 586 bp mRNA linear HTC 20-SEP-2003 

Mus musculus adult male small intestine cDNA, RIKEN full-length 
enriched library, clone : 2010011G12 product :ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus], full 
insert sequence. 
AK008188 

AK008188. 1 GI: 12842221 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new genes 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



Genome Res . 
20499374 
11042159 
3 

Shibata, K. , 



10 (10), 1617-1630 (2000) 



, Itoh,M., Aizawa,K 
Konno,H., Akiyama,J., Nishi,K 
Sumi,N., Ishii,Y., Nakamura,S 

Yamamoto,R., Matsumoto, H . , Sakaguchi, S . , Ikegami,T. 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E 



Nagaoka,S. , Sasaki, N. , Carninci,P. , 
Kitsunai,T. , Tashiro,H., Itoh,M., 
Hazama,M., Nishine,T., Harada,A., 

Kashiwagi, K. , 
Watahiki,M. , 



Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura f S. r Kawai,J., 
Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and Hayashizaki, Y . 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FANTOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 586) 

Adachi,J., Aizawa,K., Akahira,S., Akimura,T., Arai,A. , Aono,H., 
Arakawa,T., Bono,H., Carninc^P., Fukuda,S., Fukunishi, Y. , 
Furuno,M., Hanagaki,T., Hara,A. , Hayatsu,N., Hiramoto,K., 
Hiraoka,T., Hori,F., Imotani,K., Ishii,Y., Itoh,M., Izawa,M., 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Nishi,K., 
Nomura, K. , Numazaki,R., Ohno,M. , Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D., 
Shibata, K., Shibata, Y., Shinagawa,A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A. , Takahashi, F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura,T. , Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Submitted (10- JUL-2000) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN), Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res @gsc. riken . go . jp, 
URL :http: / /genome. gsc. riken. go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 



[5* GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3']/ cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. cDNA went 
through one round of normalization to Rot = 5.0 and subtraction to 
Rot = 20.0. Second strand cDNA was prepared with the primer adapter 
of sequence [5' 

GAG AG AG AG AT T C T C G AGT T AAT T AAAT T AAT CCCCCCCCCCCCC 3 ' ] . cDNA was cleaved 
with Xhol and Sstl. Cloning sites, 5 1 end: Xhol; 3 1 end: SstI . 
Host: SOLR. 
FEATURES Location/Qualifiers 
source 1. .586 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB:2010011G12" 

/db_xref="MGI: 1897592" 

/db_xref= M taxon: 10090" 

/clone="2010011G12" 

/sex="male" 

/tissue type-"small intestine" 

/clone__lib="RIKEN full-length enriched mouse cDNA library" 
/de v_s tage= " adul t " 
CDS <1. .306 

/note="unnamed protein product; ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus] 
(SWISSPROT | Q9DBM0, evidence: FASTY, 92%ID, 96.7%length, 
match=1796) 
putative" 
/codon_start=l 
/protein_id="BAC25204 .1" 
/db_xref="GI:26359608" 

/translation="PAGFMINLDNLWIVPAWISKLSFLRWCFSGVMQIQFNGHLYTTQ 
IGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 



Query Match 15.0%; Score 303.4; DB 11; Length 586; 

Best Local Similarity 99.7%; Pred. No. 5.5e-63; 

Matches 304; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 



Qy 


1715 


CT GC C GGCT T CAT GATAAACT T GGACAAC CT GT GGAT AGT GC CTGCAT GGAT CT C CAAGC 


1774 




I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 M 1 1 1 1 1 1 1 1 1 1 1 II 1 M 




Db 


2 


CT GC C GGCT T CATGAT AAACT T GGACAAC CT GT GGAT AGT GC CT GCAT GGAT CTCCAAGC 


61 


Qy 


1775 


TGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTT 


1834 




I | | | | 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


62 


TGTCGTTCCTCCGGTGGTGCTTCTCGGGGGTGATGCAGATTCAATTTAATGGACACCTTT 


121 


Qy 


1835 


ACACCAC ACAAAT C GGCAACT T CAC CT T CT CCAT C CT C GGAGACAC GAT GAT CAGT GCC A 


1894 




I | | M || | I I I I I 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


122 


ACAC CACACAAAT C GGCAACT T CAC CTT CT C CAT C CT C GGAGACAC GAT GAT CAGT GC CA 


181 


Qy 


1895 


T GGAC CT GAACT C GCAT C CACT CT AT GC GAT CT AC CT C ATT GT CAT C GGCAT CAGCT AC G 


1954 




I | | | I I I I I I || 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


182 


T GGAC CT GAACTC GCAT C CACT CT ATGC GAT CT AC CT C ATT GT CAT CG GCAT CAGCT AC G 


241 


Qy 


1955 


GCTT CCT GTT CCT GTACTATCTAT CCTT GAAGCT CAT CAAACAGAAGTCAATTCAAGACT 


2014 



I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I M I I I I I 



Db 242 GCTTCCTGTT CCT GTACTAT CTATCCTTGAAGCTCAT CAAACAGAAGTCAATT CAAGACT 301 



Qy 2015 GGTGA 2019 

I II I I 

Db 302 GGTGA 306 



RESULT 12 

BY708144 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY708144 581 bp mRNA linear EST 16-DEC-2002 

BY708144 RIKEN full-length enriched, adult male small intestine Mus 
musculus cDNA clone 2010011G12 5 ! , mRNA sequence. 
BY708144 

BY70814 4.1 GI: 27119328 
EST ♦ 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 581) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi f J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka, I . , 
Kiyosawa, H. , Yagi,K w Tomaru,Y., Hasegawa,Y., Nogami,A. , 
Schonbach,C. , Goj obori, T . , Baldarelli, R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J . , Schriml, L.M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani, L . E . , Cousins,S., Dalla,E., Dragani, T . A. , 
Fletcher, C. F. , Forrest, A. , Frazer,K. S . , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A., Gough,J., Grimmond,S., 
Gustincich, S. , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A. , 
Kurochkin, I.V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L . , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius, J.U . , Qi, D . , Ramachandran, S . , 
Ravasi,T., Reed, J. C . , Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider , C . , Semple,C.A., Setou,M., Shimada, K. , 
Sultana, R. , Takenaka,Y., Taylor,M.S., Teasdale, R. D . , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt , C. , Wang,Y., Watanabe,Y., 
Wells, C, Wilming,L.G., Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M. , Zhu,Y., Zimmer,A., Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A. , Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi,A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J,, Birney,E. and Hayashizaki , Y . 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 



Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-resggsc . riken . go . jp, 

URL :http: //genome. gsc . riken . go . jp/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., 
Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A. , Murata,M., Nakamura,M., 
Nomura, K., Numazaki,R., Ohno,M., Ohsato,N., Saito,R., Sakazume,N., 
Sano,H., Sasaki, D,, Sato,K., Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki , Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 38 4 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .581 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="2010011G12" 
/sex="male n 

/tissue__type=" small intestine" 

/dev_stage="adult" 

/lab_host="SOLR" 

/clone_lib="RIKEN full-length enriched, adult male small 
intestine" 

/note="Site_l: Xhol; Site_2: SstI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAGCGGC C GCAACT C GAGT TTTTTTTTTT TTTT T VN 3 1 ], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot =5.0 and subtraction to Rot =.20.0. Second strand 
cDNA was prepared with the primer adapter of sequence [5' 
GAGAGAGAGAGCGGCC GCAAT TAATT CT CGAGT TAATT AAAT T AATC C CCCCCCCCC 
3']. cDNA was cleaved with Xhol and SstI." 



ORIGIN 



Query Match 14.8%; Score 298.4; DB 13; Length 581; 

Best Local Similarity 99.7%; Pred. No. 9.1e-62; 

Matches 299; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1720 GGCT T CAT GATAAACT T GGACAAC CT GT GGAT AGT GCCT GC AT GGAT CT CCAAGCT GT C G 1779 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
Db 1 GGCT TCAT GATAAACT T GGACAAC CT GT GGAT AGT GC CT GCAT GGAT CT C CAAGCT GT C G 60 

Qy 1780 TTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACC 1839 

I I I I I I I I I I I II I I I I I I I I I M I I I I I I I I II I II I I I I I I I I I I I I I I I I I II II I 
Db 61 TTCCTCCGGTGGTGCTTCTCGGGGGTGATGCAGATTCAATTTAATGGACACCTTTACACC 120 

Qy 1840 ACACAAATC GGCAACT T CAC CTT CT C CAT C CT C GGAGAC AC GAT GAT CAGT G CC ATGGAC 1899 

M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 AC ACAAAT C GG CAACTT CAC CTT CT C CAT C CT C GGAGACAC GAT GAT CAGT GCCATGGAC 180 

Qy 1900 CT GAACT CGCAT CCACT CT AT GCGAT CTACCTCATT GT CATCGGCAT CAGCTACGGCTT C 1959 

I I I I I I I I M I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 181 CT GAACT CGCAT C CACT CT AT GC GAT CTAC CT CAT T GT CAT CGGCAT CAGCT AC GGCTT C 240 

Qy 1960 CT GT T C CTGT ACT AT CT AT C CT T GAAGCT CAT CAAACAGAAGT CAATT CAAGACT GGT GA 2019 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 241 CT GT T C CT GT ACT AT CTAT C CTT GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 300 



RESULT 13 

CB502603/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
COMMENT 



Vertebrata; Euteleostomi ; 
Euteleostei ; 



CB502603 781 bp mRNA linear EST 16-MAY-2003 

ssalmge503002 gut Salmo salar cDNA, mRNA sequence. 

CB502603 

CB502603. 1 GI: 29313829 
EST. 

Salmo salar (Atlantic salmon) 
Salmo salar 

Eukaryota; Metazoa; Chordata; Craniata; 
Actinopterygii ; Neopterygii ; Teleos tei ; 
Protacanthopterygii ; Salmoni formes ; Salmonidae ; Salmo . 
1 (bases 1 to 781) 

GRASP Consortium, Davidson, W. S . , Koop,B.F. and 
http : //web . uvic . ca/cbr/ grasp . 

A survey of Salmo salar transcripts from high complexity cDNA 
libraries 

Unpublished (2002) 

Contact: Koop BF 

Centre for Biomedical Research 

University of Victoria 

PO Box 3020 STN CSC, Victoria BC, V8W 3N5, Canada 
Tel: 250 472 4067 
Fax: 250 472 4075 
Email : bkoop@uvic . ca 

Genome Sciences Centre, BC Cancer Agency cDNA preparation, 
sequencing and bioinf ormatics : Y Butterfield, R Kirkpatrick, J 
Asano, N Girn, R Guin, D Lee, S Lee, T Olson, P Pandoh, A Prahbu, D 
Smailus, L Spence, J Stott, S Taylor, G Yang, J Schein, S Jones and 
M Marra. 



POLYA^Yes . 

FEATURES Location/Qualifiers 
source 1. .781 

/organism="Salmo salar" 
/mol__type="mRNA" 
/strain="McConnell" 
/db_xref="taxon: 8030" 
/clone_lib="gut" 

/note="Vector : pBlueScriptIISK+; Library Creator: Matthew 
L Rise ; Atlantic salmon tissue contributors: Carlo Biagi, 
Mitch Uh and Robert Devlin (DFO, Vancouver, B.C.), Simon 
Jones (PBS, Nanaimo, B.C.), Seaspring Hatchery (Crofton, 
B.C.), Rachel Roper (University of Victoria)" 

ORIGIN 

Query Match 13.9%; Score 280.8; DB 14; Length 781; 

Best Local Similarity 66.2%; Pred. No. 2.2e-57; 

Matches 405; Conservative 0; Mismatches 207; Indels 0; Gaps 0; 

Qy 1408 T GT CACT CGGAGAGGT CAAT GCTGT ACT AT GAGCT GGAAGAC GGGCT GT ACACTGCT GGT 1467 

I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I II II I I I I 
Db 7 69 T GT C ACACAGAGAGAGCT AT GTT GT AC CAT GAGCT GGAGGAC GGC ATGT ATAACGT CACA 710 

Qy 14 68 C CTT ATTT CT TT GC CAAGAT C CTAGGAGAAT T GCC GGAGCACT GT GCCT AC GT CAT CAT C 1527 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M 

Db 709 TCCTACTTCTTTGCCAAGGTCCTGGGGGAGCTTCCAGAGCACTGTGTGTTCACGTTGGTC 650 

Qy 152 8 TACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTA 1587 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 64 9 TACGGCCTACCCATCTACTGGCTGGCTGGCCTGAACCAGGCCCCGGACCGCTTCCTGCTC 590 

Qy 158 8 CACT T C CTGCTCGTGT GGT T GGT GGT CTT CTGCTGCAGGAC CAT GGCCCTGGCTGCCTCT 1647 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 
Db 589 AACTTCCTGCTGGTGTGGCTCATGGTGTACTGCAGCCGCAGCATGGCTCTGTTTGTGGCT 530 

Qy 164 8 GCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTC 1707 

II I I I I I I I I I I I M I I I I I I I I I I I I I I I II III 

Db 529 GCAGCCTTACCCACCCTGCAGACATCAGCCTTCATGGGCAATTCTCTGTTCACTGTGTTC 470 

Qy 17 08 T ACCT T ACTGC C GGCT T CAT GATAAACT T GGACAACCT GTGGAT AGT GC CT GCAT GGAT C 1767 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 

Db 469 TACCTTACTGGAGGCTTCGTCATTAGCCTGGAGAACATGTGGTTCGTGGCGTCCTGGTTC 410 

Qy 1768 TCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGA 1827 

I I I I I I I I I I I I I I I I II I III I I I I I I I I I I I I I III 

Db 4 09 TCCCATGCCTCCTTCATGCGCTGGGGCTTTGAGGGCATGCTGCAGGTCCAGTTCAGGGGA 350 

Qy 1828 CAC CTTT AC AC CACACAAAT C GGCAACT TCAC CT T CT C CAT C CT C GGAGACACGAT GAT C 1887 

II I I I I I I I I I I I I I I I I I I I I I I I I I I II Ml 

Db 349 CT CAAGT AC C C CGT CT C CAT CGGCAACT T CACC CT CAACAT CGAT GGCAT ACAT GT GGT G 2 90 

Qy 18 8 8 AGT GC CAT GGACCT GAACT C GCAT CC ACT CTAT GC GAT CT AC CT CAT T GT CAT C GGCAT C 1947 

I I I I I I I I I I I I I I II Mill I I I I I I I I I I I I I I I II 
Db 289 GAAGCTATGGATATGAACCAGTACCCTCTCTACTCCTGCTACCTGGTTCTCATCGCTGTC 2 30 



1948 AGCT AC GGCTT C CT GT T CCT GT ACTAT CT AT CCTT GAAGCT CAT CAAACAGAAGT CAATT 2007 
I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I 



Db 229 GT AGT GGGCT TCAT GCT GCT CTACT AC CT AT CACT CAAAT T CAT CAAGCAGAAGT C C AGC 170 



Qy 2 008 CAAGACT GGT GA 2019 

II I I I I I I I I I 
Db 169 CAGGACTGGT GA 158 



RESULT 14 
AI574075 

LOCUS AI574075 435 bp mRNA linear EST 29-MAR-1999 

DEFINITION uj67hll.yl Sugano mouse liver mlia Mus musculus cDNA clone 

IMAGE: 1925061 5 ! , mRNA sequence. 
ACCESSION AI574075 
VERSION AI574075.1 GI: 4537449 

KEYWORDS EST . 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 435) 

AUTHORS Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 
Underwood, K. , Steptoe,M., Theising,B., Allen, M., Bowers, Y., 
Person, B. , Swaller f T., Gibbons, M. , Pape,D., Harvey, N., Schurk,R., 
Ritter,E., Kohn,S., Shin,T., Jackson, Y. , Cardenas, M., McCann,R., 
Waterston,R. and Wilson, R. 
TITLE The WashU-NCI Mouse EST Project 1999 

JOURNAL Unpublished (1999) 
COMMENT Contact: Marra M/WashU-NCI Mouse EST Project 1999 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 981353 

Seq primer: custom primer used 
High quality sequence stop: 432. 
FEATURES Location/Qualifiers 
source 1. .435 

/organism="Mus musculus" 

/mol type="mRNA" 

/strain= M C57BL" 

/db_xref="taxon: 10090" 

/clone=" IMAGE: 1925061" 

/sex="female" 

/dev_stage=" adult" 

/lab_host="DH10B" 

/clone_lib="Sugano mouse liver mlia" 

/note="Organ: liver; Vector: pME18S-FL3; Site_l: Dralll 
(CACTGTGTG) ; Site_2 : Dralll (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double-stranded cDNA was 
ligated to a Dralll adaptor [ T GTT GGCCTACT GG ] , digested 
and cloned into distinct Dralll sites of the pME18S-FL3 
vector (5' site CACTGTGTG, 3' site CACCATGTG). Xhol should 
be used to isolate the cDNA insert. Size selection was 



performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5* end primer CTTCTGCTCTAAAAGCTGCG and 3 f end 
primer C GAC CT GCAGCT C GAG CAC A . " 

ORIGIN 

Query Match 13.7%; Score 275.8; DB 9; Length 435; 

Best Local Similarity 99.3%; Pred. No. 2.6e-56; 

Matches 277; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 

Qy 1 AT GGCT GAGAAAAC CAAAGAAGAGAC CC AGCT GT GGAAT GGGACT GTACT T CAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I! II I I I I I I I I I I 
Db 2 AT GGCT GAGAAAAC CAAAGAAGAGACCCAGCT GTGGAATGGGACT GTACTTCAGGAT GCT 61 

Qy 61 T C GGGC CT C CAGGACAGCT T GT T CT CCT C GGAAAGT GACAAC AGT CT GT ACTT CAC CTAC 12 0 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 62 T C GGGC CT CCAGGACAGCT TGT T CT C CT C GGAAAGT GACAACAGT CT GT ACTT CAC CTAC 121 

Qy 121 AGT GGT C AGTC CAACACT CT GGAGGT CAGAGAT CT CAC CTAC CAGGT GGACAT C GC CTCT 180 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I 
Db 122 AGTG GT CAGT C CAACACT CTGGAG GT CAGAGAT CT CAC CTAC CAGGT GGACAT CGCCT CT 181 

Qy 181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I 
Db 182 CAGGT GC CTT GGTT T GAGCAGCT GGCT CAGT T CAAGAT AC C CT GGAGGT CT CATAGC AGC 241 

Qy 241 CAAGACT CCTGT GAGCT GGGCATCCGAAATCTAAGCTT C 279 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 242 CAAGACTCCTGTGAGCTGGGCATCCGAAATCTACACTTC 2 80 



RESULT 15 

BX482362 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



BX482362 334 bp mRNA linear EST 04-SEP-2003 

DKFZp68 6F02230_rl 68 6 (synonym: hlcc3) Homo sapiens cDNA clone 
DKFZp686F02230 5 ! , mRNA sequence. 
BX482362 

BX482362.1 GI:31942182 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 334) 

Koehrer,K., Beyer, A. , Mewes,H.W., Weil,B., Amid,C, Osanger,A. , 
Fobo,G., Han,M. and Wiemann,S. 

EST (Koehrer,K., Beyer, A., Mewes,H.W. f Weil,B., Amid,C, et al . ) 

Unpublished (2003) 
Contact: MIPS 
MIPS 

Ingolstaedter Landstr.l, D-85764 Neuherberg, Germany 
This is the 5 ! sequence of the clone insert 

Clone from S. Wiemann, Molecular Genome Analysis, German Cancer 
Research Center (DKFZ) ; Email s . wiemann@dkf z- heidelberg.de; 
sequenced by BMFZ (Biomedical Research Center at the Heinrich- 
Heine-University, Duesseldorf /Germany) within the cDNA sequencing 



FEATURES 

source 



consortium of the German Genome Project. No si sequence available. 
This clone (DKFZp686F02230) is available at the RZPD in Berlin. 
Please contact the RZPD: Ressourcenzentrum, Heubnerweg 6, 14059 
Berlin-Charlottenburg, GERMANY; Email: clone@rzpd.de. 

Location/Qualif iers 

1. .334 

/organism="Homo sapiens" 

/mol_type="mRNA" 

/db_xref= M taxon:9606" 

/clone="DKFZp686F02230" 

/ de v_s t age= " adul t " 

/lab_host="DH10B" 

/clone_lib="68 6 (synonym: hlcc3)" 

/note="Vector : pTriplEx2; Site_l: SfilA; Site_2 : SfilB; 
cDNA- collect ion" 



ORIGIN 



Query Match 12.6%; 
Best Local Similarity 85.0%; 
Matches 284; Conservative 



Score 254; DB 13; 
Pred. No. 4.7e-51; 
0; Mismatches 50; 



Length 334; 



Indels 



0; Gaps 



0; 



Qy 



Db 



1189 CAGTTTT C CACC CT GAT C C GT C GT C AGATTT C CAAT GACTT C CGGGACCT GC C CAC GCT G 1248 
| M II I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 
1 CAGTTTACGACGCT GAGCCGTCGTCAGATTTCCAAC GACTT CCGAGACCTGCCCACCCTC 60 



Qy 

Db 

Qy 

Db 



1249 CTCATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGC 1308 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 CTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGC 120 

1309 CATGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCG 1368 

I I M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M 

121 CATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCT 180 



Qy 

Db 

Qy 

Db 



1369 CT CATT C CTT T CAAT GT CAT CCT GGAT GTC GT CT C CAAAT GT CACT C GGAGAGGT CAAT G 1428 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I 

181 CT C ATC C CTTT CAAC GT CATT CT GGAT GTCAT CT C CAAAT GT T ACT C AGAGAGGGCAAT G 240 

1429 CTGTACTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATC 1488 

M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

241 CTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATC 300 



Qy 1489 CT AGGAGAAT T GC C GGAGCACT GT GC CTAC GT CA 1522 

II II II I I I I I I I I I I I I I I I I I I I III 
Db 301 CTCGGGGAGCTTCCGGAGCACTGTGCCTACATCA 334 



Search completed: February 26, 2004, 09:39:24 
Job time : 3419.29 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: February 26, 2004, 00:40:23 ; Search time 5185.97 Seconds 

(without alignments) 
16874.299 Million cell updates/sec 



Title: US-09-989-981A-3 
Perfect score: 2019 

Sequence: 1 atggctgagaaaaccaaaga agtcaattcaagactggtga 2019 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3470272 seqs, 21671516995 residues 

Total number of hits satisfying chosen parameters: 6940544 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : GenEmbl : * 



1: 




gb ba : * 


2- 




gb_htg: * 


3 




gb in : * 


4 




gb_om : * 


5 




gb ov:* 


6 




gb pat:* 


7 




gb ph : * 


8 




gb pi : * 


9 




gb__pr : * 


10: 


gb ro:* 


11: 


gb sts:* 


12: 


gb_s y : * 


13: 


gb un : * 


14: 


gb vi:* 


15: 


em ba : * 


16: 


em fun : * 


17: 


em hum: * 


1 


3: 


em in:* 


19: 


em mu : * 


20: 


em om:* 


21: 


em or:* 


22: 


em ov:* 


23: 


em pat : * 


24: 


em ph : * 


25: 


em pi : * 


26: 


em ro : * 


27: 


em sts:* 



o o 
ZO 


em 


un : * 


2y 


em 


vi : * 


3U 


em 


ntg num. 


31 


: em 


htg inv:* 


32 


: em 


ntg other : * 


33 


: em 


htg mus : * 


34 


: em 


ntg pin : * 


35 


: em 


htg rod:* 


O 0 


; em 


4~~ at "m m • ^ 

ntg uiciiu . 


37 


: em 


htg vrt:* 


38 


: em 


sy : * 


39 


: em 


htgo hum:* 


40 


: em 


htgo mus : * 


41 


: em 


htgo other:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No. 


Score 


Match 


Length 


DB 


ID 


Description 




1 


2019 


100. 


0 


2019 


6 


AX685731 


AXooo / Jl bequence 




2 


2019 


100. 


0 


2284 


10 


AY196216 


7\ V "1 O £!0 1 C "NjTi -i ri -m 1 l noil 

AYiybZlb mus muscu 




3 


2006 


99. 


4 


3674 


10 


AF324495 


AroZfifiyo mus muscu 




4 


1999 . 8 


99. 


0 


22 85 


10 


AY196215 


AVI Q £T O 1 C "NJfi i o mi i c r*i l 

Aiiyo^io j-ius muscu 




5 


1727 . 8 


85. 


6 


4829 


10 


AF351785 


Arool/oo Kautus no 




6 


1430 


70. 


8 


2669 


D 


AX.DO0 / OJ 






7 


1428.4 


70. 


7 


2022 


9 


AF320294 


AF320294 Homo sapi 




8 


1428.4 


70. 


7 


2679 


9 


AF324494 


AF324494 Homo sapi 




9 


743. 8 


36. 


8 


3239 


6 


AX478099 


AX478099 Sequence 


c 


10 


302.2 


15. 


0 


204584 


10 


AC122243 


AC122243 Mus muscu 




11 


275.8 


13. 


7 


1387 


10 


F351799S06 


AF351804 Mus muscu 




12 


270.6 


13. 


4 


1378 


10 


F351799S11 


AF351809 Mus muscu 




13 


264.8 


13. 


1 


237445 


2 


AC120701 


AC120701 Rattus no 


c 


14 


264.8 


13. 


1 


312858 


2 


AC112747 


AC112747 Rattus no 


c 


15 


264.2 


13. 


1 


40929 


10 


AY145899 


AY145899 Rattus no 




16 


245.2 


12. 


1 


1470 


10 


F351799S04 


AF351802 Mus muscu 




17 


225. 8 


11. 


2 


207760 


2 


AC146286 


AC146286 Callicebu 




18 


225.6 


11. 


2 


660 


9 


F351812S06 


AF351817 Homo sapi 




19 


225.6 


11. 


2 


127066 


9 


AC084265 


AC084265 Homo sapi 




20 


225. 6 


11. 


2 


139342 


9 


AC108476 


AC108476 Homo sapi 


c 


21 


222.4 


11. 


0 


202533 


2 


AC146464 


AC146464 Saimiri s 




22 


219.4 


10. 


9 


178016 


2 


AC146787 


AC146787 Aotus nan 




23 


219.4 


10. 


9 


185045 


2 


AC146466 


AC146466 Callithri 


c 


24 


216.8 


10. 


7 


159346 


2 


AC145533 


AC145533 Lemur cat 


c 


25 


212 


10. 


5 


68166 


2 


AC084712 


AC084712 Homo sapi 




26 


206.2 


10. 


2 


1292 


9 


F351812S11 


AF351822 Homo sapi 




27 


205.2 


10. 


2 


642 


10 


F351799S09 


AF351807 Mus muscu 




28 


205.2 


10. 


2 


182261 


2 


AC087053 


AC087053 Homo sapi 




29 


199.2 


9. 


9 


1920 


6 


AX456519 


AX456519 Sequence 




30 


199.2 


9. 


9 


2340 


6 


AX320883 


AX320883 Sequence 




31 


199.2 


9. 


9 


2340 


6 


AX685733 


AX685733 Sequence 




32 


199.2 


9. 


9 


2340 


9 


AF320293 


AF320293 Homo sapi 




33 


199.2 


9. 


9 


2516 


6 


AX456520 


AX456520 Sequence 



Q A 


1 QQ 9 

±y y . z 


Q 

y 


Q 


974 n 


q 


AF°,1 971 R 

MX Ol6 / J_ O 


AF312715 Homo saoi 


OJ 


±y o 


Q 

y 


n 
. / 


z u o o 






AX4S6526 Seauence 






Q 

y 


n 
. . / 


94 70 
i ft / U 




AF°» 1 9714 


AF312714 Rattus no 




1 ft Q /l 

i o y . 4 


Q 

y 


A 

. ft 




9 


API 4 s CO o 


An4SS33 Lemur cat 


o o 
o o 


1 o o . Z 


y 


o 
• O 


9 ^ ^ 1 

Z o D JL 


1 0 


AVI Q Rft 79 


AY1QSR79 Mus muscu 






y 




9 "3 ^ 1 


i n 

1U 


AV1 Q^QTQ 

/\x xyoo / o 


AVlQSft7°» Mii9 muscu 

/\ 1 1 J JO / sJ l IIILIO v_« u. 


A f\ 


loo.b 


y 


9 
. z 


1 Q1 C 

± y i j 


D 


MAfl jDjlj 




41 


186.6 


9 


.2 


1959 


6 


AX685729 


AX685729 Sequence 


42 


186.6 


9 


.2 


2258 


6 


AX320881 


AX320881 Sequence 


43 


186.6 


9 


.2 


2354 


6 


AX456524 


AX456524 Sequence 


44 


186.6 


9 


.2 


2354 


10 


AF312713 


AF312713 Mus muscu 


c 45 


185 


9 


.2 


64889 


2 


AC120532 


AC120532 Oryza sat 



ALIGNMENTS 



RESULT 1 
AX685731 

LOCUS AX685731 2019 bp DNA linear PAT 29-MAR-2003 

DEFINITION Sequence 3 from Patent WO02081691. 
ACCESSION AX685731 

VERSION AX685731.1 GI:29371740 

KEYWORDS 

SOURCE Mus mus cuius (house mouse) 

ORGANISM Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 

AUTHORS Hobbs , H . H . , Shan,B., Barnes, R. and Tian,H. 

TITLE Abcg5 and abcg8 : compositions and methods of use 

JOURNAL Patent: WO 02081691-A 3 17-OCT-2002 ; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 

(US) 

FEATURES Location/Qualifiers 
source 1. .2019 

/organism="Mus mus cuius" 

/mo l_type="unas signed DNA" 

/db_xref="taxon: 10090" 
CDS 1. .2019 

/note="unnamed protein product; mouse ABCG8 (mABCG8 ) " 

/codon_start=l 

/protein_id="CAD8 6571.1" 

/db_xref="GI: 29371741" 

/ db_x r e f = " REMT REMB L : CAD 8 657 1 " 

/translation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LT VRET LAF I AQMRL P RT F S QAQ RDKRVE D VI AE L RLRQ CANT RVGNT YVRGVS GG E R 
RRVS I GVQLLWNPGI LI LDEPT S GLDS FTAHNLVTTLS RLAKGNRLVLI S LHQPRS DI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
I YAMPI YWLTNLRPVPELFLLHFLLWLVVFCCRTMALAASAMLPT FHMS S FFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 



ORIGIN 



Query Match 100.0%; Score 2019; DB 6; Length 2019; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2019; Conservative 0; Mismatches 0; Indels 0; Gaps 



Qy 


1 


AT GGCT GAGAAAACCAAAGAAGAGAC C CAGCT GT G GAAT GGGACT GT ACTT CAGGAT GCT 


60 




i i I I I I I I I I I I I I I I 1 1 1 1 t 1 1 t 1 1 1 1 t 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I 1 1 ] 1 | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I i i 




Db 


1 




60 


Qy 


61 


TCGGGCCTC CAGGACAGCT T GT T CT C CT C GGAAAGT GACAAC AGT CT GT ACTT CAC CTAC 


120 




i i i i i i i i i t i i i i i i I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 t 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I I I M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 H 1 H 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l l I I I I 




Db 


C 1 

Ol 




120 


Qy 


121 


AGT GGT CAGT C CAACACT CT GGAGGT CAGAGAT CT CACCT AC CAGGT GGACAT C GC CT CT 


180 




i i i i i i i i i i i i i i i i i i t I I I l l l l i l l l l l l l l 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
M ! 1 1 1 1 1 I M 1 1 M ! M 1 1 M 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 l l l l l I I I I 




Db 


Iz 1 




180 


Qy 


181 


CAGGT GCCTTGGTTT GAGCAGCT GGCT CAGTT CAAGAT AC CCT GGAGGT CT CATAGCAGC 


240 




1 1 1 1 1 1 E [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

I I I M I I M | 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


181 




240 


Qy 


241 


CAAGACT CCTGT GAGCTGGGCATCCGAAATCTAAGCTT CAAAGTGAGGAGTGGACAGATG 


300 




i i i i i i i i i i i i i i i i i i i i t i i i i i I I I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I M 1 I M 1 1 1 II 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


241 


r^> t\ t\ r~> i\ r , rrrpTir r rrArp r rrrrrA r rrrr'7\ a a 'Tr'T a afPTTP A A Af^Tf^Af^f^ TxfWCZCZRC 1 JXC^TWC^ 
CAACjAC 1 L-C 1 Cj 1 CjALtL. 1 ULiLrU/Vl L-tbAAnl 1 1 X ^/\/-VMAj 1 wVj 1 w\v^r\Acrrt.X o 


300 


Qy 


301 


CT GGCCATCAT AGGGAGCT CAGGCTGCGGGAGAGCCTCACTACTCGACGT GAT CACAGGC 


360 




i i i i t i i i i i I I I I l l l l l l 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 II I I I I I I I I I I I I I l ! I 




Db 


301 


prnr , r , r , r , Afrp7\ r r7\r , rrar , r r rrnrr'f , Trprrrararrr < TrarTArTr^ArnTriATrArA^fir 


360 


Qy 


361 


AGAGG CCACGGT GGCAAGAT GAAAT CAG GACAAATT T GGAT AAAT GGGCAAC C CAGT AC G 


420 




i i i i i i i i i i i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I M | I 1 1 1 1 1 1 1 1 I 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 II i 1 1 1 1 1 1 1 1 1 1 1 1 1 l l l l I I 




Db 


3bl 


7ir , 7\r , rr i r i Aprr ,f T i rrr'AAr'A r rr , li a ATT* arr ar A A ATTT^riAT A A ATnf^f^r* A APPP A(^T ACG 


420 


Qy 


421 


CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 


480 




i i i t i i i i i i i i i i i I I I I I l l l l l l l l 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I I I 1 I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


4z 1 


pr , rrp7\r , r r rrr r prarr , aar ,r prpr , TTrrrr ATnTftPf^r* AHPATf^APP AAPTnCTnCCCAAC 
X L-AQjUI bbl bAbbAAb 1 bbo 1 1 o^oW-Yl LrX bbbbbAb^nl o.rtv^^.r\rt\_, x x kd^\^ L-nn^ 


480 


Qy 


481 


CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 


540 




I it I I I t i I I l l l l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

I | I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 M 1 II 1 1 M II 1 II II 1 1 1 1 1 1 




DD 


A P 1 
ft O _L 


ptt ArrrTrA(^;Ar;Ar;ArrrTnf;rTTTrATT^rrrA(^ATGrGCCTGCCCAGGACCTTCTCC 


540 


Qy 


541 


CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 


600 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l l l l I I I I I I I I I I I i i i i i i 




DD 


R A 1 
C> ft X 


r , Ar , rr*rr , Arzr , r,Tr,Ar AAArnPCTf^AAftAC^TAATCGCCGAGCTGCGGCTGCGGCAGTGC 


600 


Qy 


601 


GCCAACAC CAGAGT GGGCAACACGT AT GT AC GTGGGGT GT C C GGGGGT GAG C GCC GACGA 


660 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 




Db 


601 


GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 


660 


Qy 


661 


GTGAGC AT T GGGGT GCAGCT C CTGT GGAAC C CAGGAAT CCT CATT CT GGAT GAAC C CACT 


720 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


661 


GTGAGC AT TGG GGT GCAGCT C CT GT GGAAC C CAGGAAT CCT CAT T CT GGAT GAACC CACT 


720 


Qy 


721 


TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


780 




I 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 II II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


721 


TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


780 



Qy 


781 


GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 


840 


Db 


n ft i 

/ 0 X 


1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 M 1 1 1 
rrraar ar:<^r r rr;r;Tr;rTrATr , TrrrTr'rArrAr;rrTr(^rTrTGArATCTTCAGGCTATTT 


840 


Qy 


841 


GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 


900 




1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


ft A 1 
0 ft X 


ljj/\bb X uu X X X ^ 1 u.rt.X X ou^rt^v^^v^ 1A1 ^ X /t.v_-v_- X OuuOO^ou^u^nuvynnni v_» 


900 


Qy 


901 


GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


960 




1 I I I I I I 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


y u x 




960 


Qy 


961 


TACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 


1020 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


y bi 


m AprrrrrA r"T |f rr , 2i rr?irr*ATrr nPZlPZlPflP a(ZP A A Af^A UPrtfZP Z\f^r^TP,PPP APPf^Tf^f^AC^ 


1020 


Qy 


1021 


AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 


1080 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


lUz 1 


a a rrrn r* a r , nrT"T , r , TT/T' a./"PPPTPTTPPT TxPAZmri APT AP A AP^PTTT^iATnJAPTTTPTn 


1080 


Qy 


1081 


TGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACA 


1140 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 Afi1 

1081 


rnr*r* a a a r*r* r P/"7\ r*r*c* a a. a. rr" a. a. r""PP b. fif a r* a ZiPPZiPPP AP AP ACTf A^rrTf^ArPPTr AP A 


1140 


Qy 


1141 


CAGGACACTGACTGTGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCCACC 


1200 




I 1 1 1 1 1 1 1 1 1 1 I 1 I 1 1 1 1 ! 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 1 y| 1 

114 1 


( fl Arr , Ar , 7vpTrar^rT i rrrar'TrrTCTT^a^rTCrrrn^nATnATA^A^rAnTTTTrrArr 
b/\bjbr.r\.b.r\L- 1 Ibl ubb/iL X \d\~> lull \3r\Sj\-* 1 ob^^vjovorix urtirvurrvov^no x x x ± 


1200 


Qy 


1201 


CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 


1260 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


lzUl 


prprnmpppmppmp7i/-<7\mrTimpp7\ 7\m f* a r*TT P P PPP A PPT (^P PP A P f^PT f^PTP ATT P AT C^C^C 


1260 


Qy 


1261 


TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 


1320 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 TCI 

Izbl 


t" r~* r* a a r , r , r ,r P^r , r , Tr'7ATr'Tr , PPTP7ATP7ATTPPPTTPPTTTriPT A HCXZCC ATPr^nf^PP A A CI 


1320 


Qy 


1321 


CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 


1380 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 JZ 1 


r* a r , r ,r pr"T , r , r ,r r r pr , ZiTPP7iP2iPZiPP B.PPPPTPPTPTTP ATftAT AP^n^PfiPTP ATTPPTTTP 


1380 


Qy 


1381 


AATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTACTATGAG 


1440 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 




a AT i r* r nr , aTr'r , Tr*r , ZiTPTPPTPTPPrA zlATHTr APTPP,rtAf^A(^f^TP AATf^PTf^T APT ATHAf^ 


1440 


Qy 


1441 


CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 


1500 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


X fi 'l X 


PT PP A AP AP (ZfZCZrT flT AP A PT (ACT CPtT P PTTATTT PTTTGP PAAGAT CCTAGGAGAATTG 


1500 


Qy 


1 DUl 


PPPP Arzp APTnTf^PPTAPHTPATP ATPT APf^PnATGPPP ATPTACTGGCTGACA7VACCTG 


1560 




1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1501 


CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTG 


1560 


Qy 


1561 


CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 


1620 




1 I | 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1561 


CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 


1620 



Qy 


1 fTOI 

X uZ 1 


tppappapp atppppptPt^;ptp,pptptPtPPATp,ptPtPppappttppacatgtcctccttc 


1680 




1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 i M 1 1 1 1 




Db 


. 1621 


TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 


1680 


Qy 


1 DO 1 


r P r rrTrr , z\aTrrr , r , TrTaraz\r r rrr r rTr , TArrTTarT^rrnr;rTTr2iTnA r rAAArTTGGAC 

1 1L1 uLnnl bLLL 1 X ri^ J-VM.^-* X L-L- X X ^- X.r\A_.^ X X/\V_- X O^-v^OVjV^ X X V_*.rt.X umnnn^ ± x vj^j-tt.^ 


1740 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1681 


TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 


1740 


Qy 


J. / 4 1 


AA.LL ibl bbAlAbl bL-U 1 LjL,/\X vjvj/\JL 1 X yj X Ubl 1 1 v_-^v?^jX x x x x ^w-vj 


1800 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1741 


AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 


1800 


Qy 


1 o U 1 


r* r* r* r* t 1 rntrp a r* a t t p a attt a ATfZfZ AP ArfTTTAP APP AP Af A A ATm^r A APTTPAPP 


I860 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1801 


GGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTTCACC 


1860 


Qy 


lool 


mT'prprP7\rrpp f T 1 PPP7iPBPaPP7\ r rPATPAPTPPPATPPAPPTP.AAPTPr,PATrrArTrTAT 


1920 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1861 


TTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTCTAT 


1920 


Qy 


1 l 


rnPATPT 1 a r , r ,r rr*aTTPTP a Trrpp ATP A PPT A PPP,PTTPPT<^1TT PPT PrTAPT ATPT AT PP 
oOoAI 1 Ibl L-/\X Lbotnl IZ-v^oLj^ X 1 loll X oin^im xrtx 


1980 




l \ l l l 1 1 1 t 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
M 1 1 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 I 1 1 1 1 M 1 1 1 1 1 1 II 1 II 1 II II 1 1 1 II M 1 1 1 1 1 1 1 1 




Db 


1921 


GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 


1980 


Qy 


1981 


T T GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 2019 








II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1981 


TTGAAGCT CAT CAAACAGAAGT CAATTCAAGACT GGT GA 2019 





RESULT 2 
AY196216 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



AY196216 2284 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain PERA/Ei ATP-binding cassette sub-family G 
member 8 (Abcg8) mRNA, complete cds . 
AY196216 

AY196216. 1 GI: 31322261 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorciata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2284) 

Wittenburg,H. , Lyons ,M. A. , Li, R. , Churchill, G. A. , Carey, M.C. and 
Paigen, B. 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2284) 

Lyons, M. A., Wittenburg, H . , Walsh, K. A. , Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( 12-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/Qualifiers 
1. .2284 

/organism="Mus musculus" 
/mol_type="mRNA M 



/strain="PERA/Ei" 

/db_xref="taxon: 10090" 

/ chr omo s ome= "17" 

/map- "5 5 cM" 

/sex="male" 

/tissue_type="liver" 
gene 1. .2284 

/gene="Abcg8" 
CDS 102. .2120 

/gene="Abcg8" 

/note="ATP-dependent canalicular cholesterol transporters- 
white subfamily" 
/codon_start=l 

/product="ATP-binding cassette sub-family G member 8" 
/protein_id="AAO45096. 1" 
/db_xref="GI : 31322262" 

/ trans la tion="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 

NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 

IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 

LT VRET LAF I AQMRL P RT F S Q AQRDKRVE DVI AE LRL RQCANT RVGNT YVRGVS GGE R 

RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 

FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 

EVATVEKAQSLAALFLEKVQGFDDFLWK7VE^lKELNTSTHTVSLTLTQDTDCGTAVELP 

GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 

LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 

IYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTM7VLAASAMLPTFHMSSFF 

NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 

GDTMI SAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 



Query Match 100.0%; Score 2019; DB 10; Length 2284; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2019; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


1 


ATGGCT GAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTT CAGGAT GCT 


60 




I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 




Db 


102 


AT GGCT GAG AAAAC C AAAG AAG AG AC C CAGCT GT GGAAT GGGACT GTACTT CAGGAT GCT 


161 


Qy 


61 


TCGGGCCTC CAGGAC AGCT T GT T CTC CT C GGAAAGT GACAACAGTCT GT ACT T C AC CTAC 


120 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


162 


TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 


221 


Qy 


121 


AGT GGT CAGT CCAAC ACT CT GGAGGT CAGAGAT CT C ACCT ACCAGGT GGACAT C GCCT CT 


180 




1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 




Db 


222 


AGT GGT CAGT C CAACACT CT GGAGGT CAGAGAT CT CACCT AC CAGGT GGACAT C GC CT CT 


281 


Qy 


181 


CAGGT GCCTTGGTTTGAGCAGCTGGCT CAGTT CAAGATACCCT GGAGGT CT CATAGCAGC 


240 




1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


282 


CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 


341 


Qy 


241 


CAAGACT C CT GT GAGCT GGGCAT CC GAAAT CTAAGCTT CAAAGT GAGGAGT GGACAGAT G 


300 




1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


342 


CAAGACT CCT GT GAGCT GGGCAT CC GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT G 


401 


Qy 


301 


CT GGCCAT CATAGGGAGCT CAGGCTGCGGGAGAGCCTCACTACT CGACGT GAT CACAGGC 


360 




I I I 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 




Db 


402 


CT GGCCAT CATAGGGAGCT CAGGCT GCGGGAGAGCCTCACT ACT CGACGT GAT CACAGGC 


461 



Qy 


361 


AGAGGC CAC GGT GGCAAGAT GAAAT CAGGACAAAT TT G GATAAAT GG GCAAC C CAGTAC G 


420 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


4 f>2 


AGAGG C CAC GGT GG CAAGAT GAAAT CAGGACAAATT T G GATAAAT GGGCAAC C CAGTAC G 


521 


Qy 


421 


CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 


480 




1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


S99 


r r T r a gpt GGT g AGG AAGT GC GT T GC GCAT GT GCG GCAGCAT GAC CAACT GCT GC C CAAC 


581 


Qy 


481 


CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 


540 




1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 




Db 


Sft 9 


rTGTXrCGTC Af^A^AGArCCTG^rTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 


641 


Qy 


541 


CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 


600 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


£4 9 


r nrzGrrr RGrGTGAr AAArftft^TftftAAGArGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 


701 


Qy 


601 


GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 


660 




1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 I 1 1 1 1 I ! 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


/ UZ 


rrr zi ArarrAr ArTr:r^r^rAArArr;TATr;TArGTGGGGTGTrrGGGGGTGAGrGrrGACGA 


761 


Qy 


661 


GTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACT 


720 




1 I I I I 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


/ oZ 




821 


Qy 


721 


TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


780 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


ft 9 9 


TrTrrrrTr^ArAnnTTrAPAGPrCArAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


881 


Qy 


781 


GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 


840 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




JJD 




pppaap HGGrTGGTGCTC ATrTrrrTrCACrAGrCTCGCTCTGACATCTTCAGGCTATTT 


941 


Qy 


841 


GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 


900 




1 I 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


Q4 9 
y 4 Z 


p APPTPPTpPTTPTP,ATPrAPATPT(^PAPPPPTATrTAOCTGGGGGCGGCGCAGCAAATG 


1001 


Qy 


901 


GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


960 






1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 009 


ptpp APTAPTTPAP ATPPATTf^PrACrrTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


1061 


Qy 


961 


TACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 


1020 




1 I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 069 


TAPf^TpTnAPTTGAPPAGPATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 


1121 


Qy 


1021 


AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 


1080 




1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1122 


AAG GCACAGT CT CT T GC AGC C CT GT TC CT AGAAAAAGT ACAAGGCT T T GAT GACTTT CT G 


1181 




1 0R1 


Tf^r^AAAaPTPrAfinrAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACA 


1140 




I | | | | | | | | M | I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 




Db 


1182 


T GGAAAG CT GAGGCAAAGGAACT CAACACAAG CAC CCACAC AGT C AGC CT GACC CT CACA 


1241 


Qy 


1141 


CAGGACACT GACT GT GGGACT GCT GTT GAGCT GC C CGGGAT GAT AGAGCAGT TT T C CAC C 


1200 




I I M I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 M 1 




Db 


1242 


CAGGACACT GACT GT GGGACT GCT GTT GAGCT GC C CGG GAT GAT AGAGCAGT TTT C CAC C 


1301 



Qy 


1201 


Db 


1302 


Qy 


1261 


Db 


1362 


Qy 


1321 


Db 


1422 


Qy 


1381 


Db 


1482 


Qy 


1441 


Db 


1542 


Qy 


1501 


Db 


1602 


Qy 


1561 


Db 


1662 


Qy 


1621 


Db 


1722 


Qy 


1681 


Db 


1782 


Qy 


1741 


Db 


1842 


Qy 


1801 


Db 


1902 


Qy 


1861 


Db 


1962 


Qy 


1921 


Db 


2022 


Qy 


1981 


Db 


2082 



CT GAT C C GT CGT CAGATT T CCAAT GACT T C C G GGAC CT GC CCAC GCT GCT CAT T CAT GGG 1260 

I I I I I I I I I I I I I I I I I I I I II I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 1361 

TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 1421 

CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 138 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I 
CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 1481 

AAT GT CAT C CT GGAT GT CGTCT C CAAAT GT CACT C GGAGAGGT CAAT G CT GT ACT AT GAG 144 0 
I || || || | | | I | I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
AATGT CAT C CT GGAT GTC GT CT C CAAAT GT CACT C GGAGAGGT CAAT GCT GT ACT AT GAG 1541 

CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1500 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

CT GGAAGACGGGCTGTACACT GCT GGTCCTTATTT CTTT GCCAAGATCCTAGGAGAATT G 1601 

CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTG 1560 

I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTG 1661 

CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I 
CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1721 

TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 17 81 

TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 174 0 

I I I I I I I I I I I I I II I I I I I II I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATA7VACTTGGAC 1841 

AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1800 
I I I I I II I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1901 

GGGCTGAT GCAGATTCAATTTAAT GGACACCTTTACACCACACAAAT CGGCAACTTCACC 1860 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GGGCTGAT GCAGATT CAATTTAAT GGACACCTTT ACACCACACAAATCGGCAACTT CACC 1961 

T T CT CCAT CCT CGGAGACACGAT GAT CAGT GCC AT GGACCT GAACT C G CAT C CACT CT AT 1920 

I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II 

T T CT CC AT CCT CGGAGACAC GAT GAT CAGT GC CAT GGACCT GAACT CGCAT CCACT CTAT 2021 

GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 1980 
I I I I I I II I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 2081 

TT GAAGCT CAT CAAAC AGAAGT CAAT T CAAGACT GGT GA 2019 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
TTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGA 2120 
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AF324495 3674 bp mRNA linear ROD 07-AUG-2001 

Mus musculus sterolin-2 (Abcg8) mRNA, complete cds . 

AF324495 

AF324495.1 GI:15088541 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (bases 1 to 3674) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem, I , , Bruckert,E., 
Pandya,A. , Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8 , respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 3674) 

Lu,K., Lee, M. -H . and Patel,S.B. 
Direct Submission 

Submitted (29-NOV-2000) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
Street, STB541, Charleston, SC 29403, USA 

Location/Qualifiers 

1. .3674 

/organism="Mus musculus" 

/mol_type-"mRNA ,, 

/strain="C57BL/6" 

/db_xref="taxon: 10090" 

/tissue_type=" liver" 

1. .3674 

/gene="Abcg8" 

102. .2123 

/gene="Abcg8" 

/note="ABCG8" 

/ codon_s tart=l 

/product= H sterolin-2" 

/protein_id="AAK8 4 07 9. 1" 

/db_xref="GI : 15088542" 

/translation-"MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 
SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 
AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 
NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 
RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSD 
IFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKE 
REVATVEKAQSLAALFLEKVQGFDDFLWK7VEAKELNTSTHTVSLTLTQDTDCGTAVEL 
PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAA 
LLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 
IIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNAL 
YNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI 
LGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 



ORIGIN 



Query Match 99.4%; Score 2006; DB 10; 

Best Local Similarity 99.9%; Pred. No. 0; 
Matches 2019; Conservative 0; Mismatches 0; 



Length 3674; 

Indels 3; Gaps 1; 



Qy 1 ATGGCT GAGAAAACCAAAGAAGAGACC CAGCT GT GGAAT GGGACTGTACTTCAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 102 AT GGCT GAGAAAAC CAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GTACTT C AGGAT GCT 161 

Qy 61 TC GGGC CT C CAGGACAGCTT GT T CT CCT C GGAAAGT GACAACAGTCT GTACTT C AC C 117 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I 

Db 162 TCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 221 

Qy 118 T ACAGTGGTCAGT CCAACACTCTGGAGGTCAGAGATCT CACCTACCAGGTGGACAT CGCC 177 

I | | | | I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II M I I I I I I II I I I I I 
Db 222 T ACAGT GGTCAGTC CAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACAT CGCC 281 

Qy 178 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 237 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I II I I I 

Db 282 TCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGC 341 

Qy 238 AGC CAAGACTC CT GT GAGCT GGGC AT C C GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAG 2 97 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 342 AGCCAAGACTC CT GT GAGCT GGGC AT CC GAAAT CTAAGCTTCAAAGTGAGGAGTGGACAG 4 01 

Qy 298 ATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACA 357 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 402 AT GCT G GC CAT C ATAGGGAGCT CAG GCT GC GGGAGAGC CT CACT ACT C GAC GTGAT C ACA 461 

Qy 358 GGCAGAGGCCACGGT GGCAAGAT GAAATCAGGACAAATTTGGATAAAT GGGCAACCCAGT 417 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 462 GGCAGAGGC CAC GGT GGCAAGAT GAAAT CAGGACAAAT T T GGATAAAT GGGCAAC CC AGT 521 

Qy 418 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 477 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I 
Db 522 ACGCCT CAGCT GGT GAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 581 

Qy 47 8 AACCT GAC CGT CAGAGAGAC CCT GGCT TT CAT T GC C CAGAT GCGCCTGCC CAGGACCTT C 537 

I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I II I I 
Db 582 AACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTC 641 

Qy 538 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 597 

M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 642 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 701 

Qy 598 TGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGA 657 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 702 TGCGCCAACACCAGAGT GGGCAAC AC GT AT GTACGTGGGGTGTCCGGGGGTGAGCGCCGA 761 

Qy 658 C GAGT GAGCAT T GGGGT GCAGCT CCT GT GGAACCC AGGAAT C CT CAT T CT GGAT GAAC CC 717 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I 
Db 762 C GAGT GAGCAT T GGGGT GCAGCT C CT GT GGAACCCAGGAAT CCT CAT T CT GGAT GAAC C C 821 

Qy 718 ACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCC 777 

I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 822 ACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCC 881 



PjT.7 

vy 


77 ft 


AAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 


837 




I I I I 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


882 


AAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 


941 


yy 


ft 


TTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 


897 




1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


942 


TTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 


1001 


yy 


ft Q ft 


ATP,nTP,PAPrTAPTTPAPATPCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGAC 


957 




I I I I 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 1 II 




Db 


1002 


ATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGAC 


1061 


yy 


Q S ft 


TTrT ArnTnnAPTTn AC P AGC AT C GACAGAC GCAGCAAAGAAC GGGAGGT GGC CACC GT G 


1017 




I I I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1062 


T T CT ACGT GGACTT GAC C AGC AT C GACAGAC GCAGCAAAGAAC GGGAGGT GGC CAC CGT G 


1121 


Pit T 

yy 




a a cz a A ggc A P A PtT PT PTT GC A GC C C T GT T CCTAGAAAAAGT ACAAGGCTTT GATGACTTT 


1077 




1 || 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 




Db 


1122 


GAGAAGGCAC AGT CT CT T GCAGCC CT GT T CCT AGAAAAAGT ACAAGGCT T T GAT GACTT T 


1181 


Qy 


1 07ft 


p TP T A A a GC T G A GGC A A APtPt A ACT P AA C AC AA GCACC CACACAGT C AGC CT GAC CCT C 


1137 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1182 


CT GT GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C CACACAGT C AGC CT GAC CCT C 


1241 


P T t 

yy 


1 1 "3ft 


APAPAr^APAPTr;APTGTGGGAPTGPTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 


1197 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1242 


ACACAGGACACT GACT GT GG GACT GCT GT T GAGCT GCCC GGGAT GAT AGAGCAGT T TT C C 


1301 


Qy 


1198 


ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 


1257 




I I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1302 


ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 


1361 


Pit t 

Qy 


1Z JO 


PPPTPPP, A APtPPTPtPPTPtATGTPPPTP ATP ATTGGPTTPCTTTACTACGGCCATGGGGCC 


1317 


Db 


1362 


| | | | | | | | | || | | | | 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 


1421 


pi T T 

yy 


i ^i r 

± o i. 0 


AAP,PAPPTPTPPTTPATnPTAPAPAGPAGPCCTCCTCTTCATGATAGGGGCGCTCATTCCT 


1377 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 




Db 


1422 


AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 


1481 


Pitt 

yy 


1 "37ft 
J. o / o 


TT P A AT P/T P AT P P T GG AT GT P GT P T PC A AAT GT C ACT CGGAGAGGT CAAT GCTGT ACT AT 


1437 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 




Db 


1482 


T T CAAT GT CAT CCT GGAT GT C GT CT C CAAAT GT CACT C GGAGAGGTCAAT GCT GT ACT AT 


1541 


P)W 

yy 


_L 1 O O 


GAGCTGGAAGACGGGCT GTACACT GCT GGT C CT T ATTT CTT T GC CAAGAT C CTAGGAGAA 


1497 




I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1542 


GAGCT GGAAGACGGGCT GTACACT GCT GGT C CT T ATT T CT TT GC CAAGAT C CTAGGAGAA 


1601 


yy 


1 4 Qft 


TT GC C GGAGCACT GT GCCT AC GT CAT C AT CT AC GC GAT GCC CAT CT ACTG GCT GACAAAC 


1557 




I I 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1602 


TT GC C GGAGCACT GT GC CT AC GT CAT CAT CT AC GC GAT GC CCAT CTACT GGCT GACAAAC 


1661 


Qy 


1558 


CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 


1617 




I I I 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1662 


CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 


1721 


Qy 


1618 


TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 


1677 



Db 



1722 



1781 



Qy 167 8 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 1782 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1841 

Qy 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 1842 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1901 

Qy 1798 T C GGGGCT GAT GC AGAT T CAAT T TAAT GGAC AC CTTT AC ACCAC ACAAAT C GGCAACT T C 1857 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 1902 TCGGGGCT GAT GCAGAT T CAAT TTAAT GGAC AC CTTT ACAC CAC ACAAAT C GGCAACTTC 1961 

Qy 1858 ACCTT CTC C AT C CT C GGAGACAC GAT GAT CAGT GC C AT GGACCT GAACT CGC AT C CACTC 1917 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I 
Db 1962 AC CTT CT C CAT C CT CGGAGACAC GAT GAT CAGT GCC AT GGAC CT GAACTC GCAT CCACT C 2021 

Qy 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 2022 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 2081 

Qy 197 8 T CCTT GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 2019 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2082 T CCTT GAAG CT CAT CAAACAGAAGT CAATT CAAGACT GGT GA 2123 
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LOCUS 

DEFINITION 

ACCESSION 
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TITLE 
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TITLE 
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AY196215 2285 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain I/LnJ ATP-binding cassette sub-family G member 
8 (Abcg8) mRNA, complete cds . 
AY196215 

AY196215.1 GI: 31322259 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (bases 1 to 2285) 

Wittenburg,H. , Lyons , M. A. , Li,R., Churchill, G. A. , Carey, M.C. and 
Paigen, B. 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2285) 

Lyons, M. A., Wittenburg, H. , Walsh, K. A., Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( 12-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/Qualifiers 

1. .2285 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain^'I/LnJ" 



/db_xref="taxon: 10090" 

/chromosome-" 17" 

/map="55 cM" 

/sex="male" 

/tissue_type=" liver" 
gene 1. .2285 

/gene="Abcg8" 
CDS 102. .2120 

/gene="Abcg8" 

/note="ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start-l 

/product="ATP-binding cassette sub-family G member 8" 

/protein_id="AAO45095.1" 

/db_xref="GI : 31322260" 

/ trans lation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAAELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMI GALI PFNVI LDWSKCHSERSMLYYELEDGLYTAGP YFFAKI LGELPEHCAYVI 
IYAMPIYWLTNLRPVPELFLLHLLLWLVVFCCRTMALAASAMLPTFHMSSFFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 

Query Match 99.0%; Score 1999.8; DB 10; Length 2285; 

Best Local Similarity 99.4%; Pred. No. 0; 

Matches 2007; Conservative 0; Mismatches 12; Indels 0; Gaps 0; 

1 AT GGCTGAGAAAACCAAAGAAGAGACCCAGCT GT GGAAT GGGACT GTACTTCAGGATGCT 60 
M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I 
102 ATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGCT 161 

61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 120 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I 
162 TCGGGCCTC CAGGAC AGCT T GTT CT C CT CGGAAAGT GACAACAGT CT GT ACTT CAC CT AC 221 

121 AGT GGT CAGT C CAAC ACT CT GGAGGT CAGAGAT CT CAC CT ACC AGGT GGACAT C GC CT CT 180 

II II I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I II I I I I II 
222 AGC GGT CAGT C CAAC ACT CT GGAGGT CAGAGAT CT CAC CT ACC AGGT GGACAT CGC CT CT 2 81 

181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 240 

I I I I I I I I I I I I I I I I I I II I I I II I I I II I I I I I I I I I I I I M I II I I M I M I I I I I I 
2 82 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 341 

241 CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT G 300 

I || I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I II I I I I I I 
342 CAAGACTC CT GT GAGCT GG GCAT CC GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT G 401 

301 CT GGCCAT CATAGGGAGCT CAGGCTGCGGGAGAGCCT CACTACTCGACGT GAT CACAGGC 360 

I I I I I I I I I I I I I I I I I I I II I I M I I I I II I II I I I I I I I I I I I I I I I I I I I II M I II 
4 02 CT GGC CAT C AT AGGGAG CT C AGGCT GCGGGAGAGCCT CACT ACT C GAC GT GAT C ACAGG C 461 



QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 
Db 





361 


AGAGGCCACGGT GGCAAGAT GAAAT CAGGACAAATTTGGATAAATGGGCAACCCAGTACG 


420 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


462 


AGAGGC CACGGT GGCAAGAT GAAAT CAGGACAAATTT GGATAAAC GGGCAACCCAGTACG 


521 


wy 


4 ? 1 

1 X 


CCT CAGCT GGTGAGGAAGT GCGTTGCGCATGT GCGGCAGCAT GACCAACT GCT GCCCAAC 


480 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


522 


CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 


581 




4 R 1 
*± o X 


CT GAC C GT C AGAGAGAC CCTGGCTTT CATT GC C CAGAT GC GC CT GC C CAGGAC CTT CT CC 


540 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 




Db 


582 


CT GAC C GT C AGAGAGAC CCTGGCTTT CAT T GC CCAGAT GCGC CT GC C CAGGACCTT CT C C 


641 


uy 


R4 1 


PAr;r;rrrAnC(^TGArAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 


600 




M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


642 


CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 


701 


yy 


DUX 


rrrAAP ArrA(^A(^T^;^;r;rAArACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 


660 




I I I 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 




Db 


702 


GC CAACAC CAGAGTGGGCAACAC GT ATGT AC GT GG GGT GT CCG GG GGT GAGC GC CGAC GA 


761 


At, 

yy 


DDI 


CSVCkCC AT t an^T C;r APrC T C C T GT GGAAC C C AGGAAT C CTC AT T CT GGAT GAACC CACT 


720 




| I I 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


762 


GT GAGCATT GGGGT GCAGCT C CT GT GGAAC C CAGGAAT C CT C ATT CT GGAT GAACCC ACT 


821 


yy 




TrTr;r;rrTrr;ArAr;rTTCArAGrcrACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


780 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


822 


TCTGGCCTCGACAGCTTCACAGCCCAC7^ACCTGGTGACAACCTTGTCCCGCCTGGCCAAG 


881 


Qy 


7 ft 1 
/ox 


rrnAArAf^crTf^^T^rTrATfTrrrTrrArrAGCCTCGCTCTGACATCTTCAGGCTATTT 


840 




| | | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


882 


GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 


941 


yy 


ft 4 1 
O 1 X 


r;ArrTr;nTrrTTrTr;ATGAr*ATrTGGrArCCCTATCTACCTGGGGGCGGCGCAGCAAATG 


900 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 




Db 


942 


GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 


1001 


yy 


qni 

37 U X 


r^Tnr AHTAPTTr ArATrrATT^nrCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


960 






I I I I I I 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


1002 


GT GC AGT ACTT CAC AT C C ATT GGC CAC C CTT GT CCT CGCTAT AGCAAC C CT GCAGACTT C 


1061 


yy 


Qfil 


TArriTn^ArTTGArrAGrATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 


1020 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1062 


T ACGT GGACT T GAC CAGC AT CGAC AGAC G CAGCA7VAGAACGGGAGGT G GC CAC CGT GGAG 


1121 


yy 


X VJZ. X 


A 7\aa C AC a r T CT T GC A GC C C TGT T C CT AGAAAAAGT ACAAGGCTTT GAT GACT T T CT G 


1080 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1122 


AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTGCAAGGCTTTGATGACTTTCTG 


1181 


wy 


1081 


T GGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCT GACCCT CACA 


1140 




I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1182 


T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C C ACACAGT CAGCCT GAC CCT CACA 


1241 


Qy 


1141 


CAGGAC ACT GAC T GT GGGACT GCT GTT GAGCT GCC CGGGAT GAT AGAGC AGT TTT C CAC C 


1200 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1242 


CAGGACACTGACTGTGGGACTGCTGCTGAGCTGCCCGGGATGATAGAGCAGTTTTCCACC 


1301 


Qy 


1201 


CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 


1260 



1302 CT GAT C C GT C GT CAGATT T C CAAT GACT T C C GGGACCT GC C CACG CT G CT CATT CAT GGG 1361 

1261 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 132 0 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
1362 TCGGAAGCCTGCCTGATGTCCCTCATCATCGGCTTCCTTTACTACGGCCATGGGGCCAAG 1421 

1321 CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 138 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
1422 CAGCTCTCCTTCATGGACACGGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 1481 

1381 AAT GT CAT C CTGGAT GT CGT CT C CAAAT GT CACT CGGAGAGGT CAAT GCT GT ACTAT GAG 144 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1482 AAT GT CATCCTGGATGT CGTCTCCAAATGT CACTCGGAGAGGTCAAT GCTGTACTAT GAG 1541 

1441 CT GGAAGAC GGGCT GT ACACT GCT GGT CCT TAT T TCT T T GC CAAGAT C CT AGGAGAATT G 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
1542 CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1601 

1501 CC GGAGCACT GT GC CT ACGT CAT CAT CTAC GC GAT GC C CAT CT ACT GGCT GACAAAC CT G 1560 

I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1602 CC GGAGCACT GT GC CT ACGT CAT CAT CT AC GC GAT GC C CAT CTACT GG CT GACAAAC CTG 1661 

1561 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I 
1662 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTACTGCTTGTGTGGTTGGTGGTCTTCTGC 1721 

1621 TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1722 TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1781 

1681 TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 1740 

I I I I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1782 TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACCGCCGGCTTCATGATAAACTTGGAC 1841 

1741 AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1800 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
1842 AACCTGTGGATAGTGCCTGCATGGATATCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1901 

1801 GGGCT GATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTTCACC 1860 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I 
1902 GGGCTGAT GCAGATT CAATTTAATGGACACCTTTACACCACACAAATCGGCAACTT CACC 1961 

1861 T T CT C CAT CCT C GGAGACACGAT GAT CAGT GC C AT GGAC CT GAACT C GC AT CC ACT CTAT 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
1962 T T CT CCAT C CT CGGAGAC AC GAT GAT CAGT GC CAT GGAC CT GAACT C GCAT CCACT CTAT 2021 

1921 GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 1980 

I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2 022 GC GAT CTAC CT CATT GT CAT CGGCATCAGCTAC GGCT T CCT GT T C CT GT ACTAT CTAT CC 2081 

1981 T T GAAGCT CAT CAAACAGAAGT CAAT T CAAGACTGGT GA 2019 

I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I II I I 
2082 T T GAAGCT CAT CAAACAGAAGT CAATT CAAGACT GGT GA 212 0 



AF351785 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AF351785 4829 bp mRNA linear ROD 26-AUG-2002 

Rattus norvegicus sterolin-2 (Abcg8) mRNA, complete cds . 

AF351785 

AF351785.2 GI: 22477145 



Chordata; Craniata; Vertebrata; Euteleostomi ; 
Rodentia; Sciurognathi ; Muridae; Murinae; 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 
FEATURES 

source 



gene 
CDS 



Rattus norvegicus (Norway rat) 
Rattus norvegicus 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
Rattus . 

1 (bases 1 to 4829) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef ,A.F. , Mietinnen, T . , Bjorkhem, I . , Bruckert,E., 
Pandya,A., Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 4829) 

Lu,K., Yu,H., Lee,M. and Patel,S.B. 

Molecular cloning, genomic structure, and characterization of novel 

mouse head-to-head tandem ABC transporters 

Unpublished 

3 (bases 1 to 4829) 
Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29407, USA 

4 (bases 1 to 4829) 

Lu,K., Yu,H., Lee,M. and Patel,S.B. 

Direct Submission 

Submitted (26-AUG-2002 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On Aug 26, 2002 this sequence version replaced gi: 15148516. 
Location/Qualifiers 
1. .4829 

/organism="Rattus norvegicus" 

/mol_type="mRNA" 

/strain="Sprague-Dawley" 

/db_xref="taxon: 10116" 

1. .4829 

/gene="Abcg8" 

111. .2129 

/gene= ,! Abcg8" 

/codon_start=l 

/product="sterolin-2" 

/protein_id="AAK84831 .2" 

/db_xref="GI: 22477146" 

/translation="MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDMASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLA 



IIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPN 
LTVRET LT FI AQMRL P KT FS QAQ RD KRVE DVI AE L RL RQ CANT RVGN T YVRGVS GGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQ 
EVATMEKARLIAALFLEKVQGFDDFLWKAEAKSLDTGTYAVSQTLTQDTNCGTAAELP 
GMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPLSFMDMAAL 
LFMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVI 
IYGMPIYWLTNLRPGPELFLLHFMLLWLWFCCRTMALAASAMLPTFHMSSFCCNALY 
NSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVP 
GDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW" 



ORIGIN 



Query Match 85.6%; Score 1727.8; DB 10; Length 4829; 

Best Local Similarity 91.0%; Pred. No. 0; 

Matches 1837; Conservative 0; Mismatches 182; Indels 0; Gaps 0; 

AT GGCT GAGAAAACCAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GT ACTT CAGGAT GCT 6 0 

I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
AT GGCT GAGAAGACCAAAGAGGAGACCCAGCTGT GGAACGGGACTGT ACTCCAGGAT GCT 170 

TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 120 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

T CAAGC CT CC AGGACAGC GT GTT CT C CT CT GAAAGT GACAACAGC CT CT ACT TCACCT AC 230 

AGT GGT CAGT C CAACACT CT GGAG GT CAGAGAT CT CAC CT ACCAG GT GGACAT C GCCT CT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

AGT GGT CAGT C CAACACT CT GGAGGT CAGAGAT CT CAC CT AC C AGGT GGACATGGCCT CT 2 90 

CAGGT GCCTT GGTTTGAGCAGCT GGCT CAGTTCAAGATACCCTGGAGGTCT CATAGCAGC 240 

I M I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGTTACCGTGGAGGTCTCGCGGCAGC 350 

CAAGACTCCT GTGAGCTGGGCAT CCGAAAT CTAAGCTT CAAAGT GAGGAGTGGACAGATG 300 

II I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I 

CAGGACT CCT GGGATCT GGGCATCCGAAAT CTGAGCTT CAAAGT GAGGAGT GGACAGATG 410 

CT GGCCATCATAGGGAGCT CAGGCTGCGGGAGAGCCT CACTACT CGACGT GATCACAGGC 360 
M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I M I I I I I I I 
CT GGCT AT CAT AGGGAGCGCAGGCT GCGGGAGAGCCACATT ACT CGACGTT AT CACAGGC 470 

AGAGGC CAC GGT GGCAAGAT GAAATC AGGACAAATT T GGAT AAAT GGGCAACCCAGT ACG 420 
I I I I III I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
AGAGACCAT GGT GGCAAGAT GAAATCAGGACAAATCT GGATAAACGGGCAACCCAGCACG 530 

CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 480 

I I I I I I I I I I I I I I I I II M I I I I I I II I I I I I I I I I I I I I I I I I I I I 

C CT CAGCT GAT ACAGAAGT GTGT GGC ACAT GT GC GC CAGCAAGACC AGCT GCT C CCCAAT 590 

CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

I I I II I II I I I M I II I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I 
CT GACT GT C AGAGAGAC CCT GACT TT CAT C GC C CAGATGCGC CT GC C CAAGAC CTT CT CT 65 0 

CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I I I I I I I II I II I I I I I I I I I I I I I II II I I I I I I I I I M II I I I II I I I 

CAGGCCCAGCGAGACAAACGGGTGGAAGACGTGATTGCGGAGCTGCGGCTGCGGCAGTGC 710 



Qy 


1 


Db 


111 


Qy 


61 


Db 


171 


Qy 


121 


Db 


231 


Qy 


181 


Db 


291 


Qy 


241 


Db 


351 


Qy 


301 


Db 


411 


Qy 


361 


Db 


471 


Qy 


421 


Db 


531 


Qy 


481 


Db 


591 


Qy 


541 


Db 


651 



Qy 



601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 



711 GCCAACACCCGCGTGGGCAACACATACGTACGCGGGGTGTCCGGGGGCGAGCGCCGAAGA 77 0 

661 GTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACT 720 

| | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M II I I I I I I I 
771 GT GAGCAT CGGGGT G CAG CT C CT GT GGAACC C AGGAAT C CT CATC CT GGATGAAC C CACT 830 

721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

|| I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I Ml I II I I I I I I I I I I I I I I 
831 TCCGGCCTCGACAGCTTCACCGCTCACAACCTGGTGAGAACTTTGTCCCGCCTGGCCAAA 890 

781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 840 

| | | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I 
8 91 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 950 

841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I IN 

951 GACCTGGTCCTTCTGATGACGTCTGGCACCCCTATCTACCTGGGGGTGGCACAGCACATG 1010 

901 GTGCAGT ACT T CAC AT C CAT T GGC CAC C CTT GTC CTC GCT ATAGCAAC CCT GC GGACT T C 960 
I I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1011 GTGCAGTACTTTACATCAATTGGCTACCCTTGTCCTCGCTACAGCAACCCTGCTGACTTC 1070 

961 TACGTGGACTTGACCAGCAT CGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT GGAG 1020 
I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1071 TACGTGGACTT GACGAGCATT GACAGGCGCAGCAAAGAACAGGAGGT GGCCACCAT GGAG 1130 

1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1131 AAGGCTCGATTACTTGCAGCCTTGTTCCTAGAAAAAGTGCAAGGCTTTGACGACTTTCTG 1190 

1081 TGGAAAGCTGAGGCAAAGGAACT CAACACAAGCACCCACACAGT CAGCCT GACCCTCACA 114 0 

M I I I I I I I I I I II I I I I III Mill Mill I IIMNIM IIIIIIIIM 

1191 TGGAAAGCT GAGGCAAAGAGT CTCGACACAGGCACCTATGCAGT CAGCCAGACCCTCACA 1250 

1141 CAGGACACTGACT GTGGGACT GCT GTT GAGCTGCCCGGGAT GATAGAGCAGTTTT CCACC 1200 

|| I I I I II II I I II I I I II I I I II I II I II I II I I II II I I I I I I I II I Mill 
1251 CAGGACACCAACT GTGGAACT GCT GCT GAGCTGCCCGGGATGAT ACAGCAGTTTACCACC 1310 

1201 CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 1260 

I || | || | | I I II I I I I I I II I II I II I I II I M I I I I M I I I I III I I I I I I I I I 
1311 CT GAT C CGT C GT C AGAT TT C CAAT GACT T C C GGGACT TGC C CACCCT GTT CAT C CAT GGA 1370 

1261 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 1320 

| I II I II M I II I I I II I II II I I II I I I I II M II I II II I I I II I I I I I I I 
1371 GCAGAAGCCTGCCTGATGTCTCTCATCATTGGCTTCCTTTACTACGGCCACGCAGACAAG 1430 

1321 C AGCT CT C CTT CAT GGACAC AGCAGC C CT C CT CTT CAT GAT AGGGGC GCT CAT T C CTT T C 138 0 

I II I I II I I II I II II I I II I II II II I I II I II I II I II II I II I I I I I I I I 
1431 CCGCTCTCCTTCATGGACATGGCAGCCCTCCTGTTCATGATAGGAGCACTCATTCCTTTT 1490 

1381 AAT GT CAT CCT GGAT GT C GT CT C CAAAT GT CACT C GGAGAGGT CAAT GCT GT ACT AT GAG 1440 

I I II II I I I II M I I II II I II I II II II M I M II II II I I II I II I I I I I I II 
1491 AAT GT CAT T CT GGAT GTC GT CT C CAAAT GT CACT C GGAGC GGT C GCT GCT GT ACT AT GAA 1550 

1441 CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1500 
Mill Mill I II M I I I II I I II I I I I II I II I I II I M I I I I I I I II M II 



Db 



1551 CTGGAGGACGGACTGTACACTGCTGGTCCTTATTTCTTTGCCAAGGTCCTCGGTGAGCTG 1610 



Qy 


1501 


Db 


1611 


Qy 


1561 


Db 


1671 


Qy 


1621 


Db 


1731 


Qy 


1681 


Db 


1791 


Qy 


1741 


Db 


1851 


Qy 


1801 


Db 


1911 


Qy 


1861 


Db 


1971 


Qy 


1921 


Db 


2031 


Qy 


1981 


Db 


2091 



C CGGAGC ACT GT GC CT AC GT CAT CAT CT ACG C GAT GC C CAT CTACT GGCT GACAAACCT G 1560 

|| | I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

C CAGAGCACT GT GC CTAT GT CAT CAT CT AT GGGAT GC C CAT CTACT GGCT GACCAACCT G 1670 

CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

I I I I I I I I I II II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CGGCCAGGGCCTGAGCTCTTCCTCCTGCACTTCATGCTTCTGTGGCTGGTGGTGTTCTGC 1730 

TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

| I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I II I I 
TGCAGGACCATGGCCCTGGCCGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 17 90 

TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 1740 

I MINI || I I I I II I M I I I I II I I I I I I M II I I I I I I I I I I I II I M M M 
TGCTGCAACGCTCTCTACAACTCCTTCTACCTTACGGCTGGCTTCATGATAAACTTGAAC 1850 

AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1800 

II I I I I I I I I I I I I II I I I I II I I I I I I I I I I M I M I I I I I I I I I I I I M I I I I I 

AAC CT GT GGAT AGT AC CT GCAT GGATTT CCAAGAT GT C GT T C CT C CG GT GGT GCTT CT CA 1910 

GGGCTGAT GC AGAT T CAATT TAAT GGACAC CT TT ACAC CACACAAAT CGGCAACT T C AC C 1860 

I | | I I I I II II I II I II I II I I I I I I I II I I I I I I I II I II I I I I I I I II I II I I 

GGGCT GAT GCAGATT CAGTT TAAT GGACAC ATT T AC AC CACGCAGAT C GGCAAC CT CAC C 1970 

TTCT CCATCCTCGGAGACACGAT GATCAGT GCCATGGACCT GAACTC GCAT CCACT CTAT 1920 

I I I I I I III M I I I I I Mill Ml M I I I II II I M II II II Mill II III 

TTCTCCGTCCCCGGAGACGCGATGGTCACTGCCATGGACCTGAACTCACATCCTCTTTAT 2030 

GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 198 0 

|| II I II II I I I II II II I II M II I II I II I M I M II I I II I I II I M I II 
GCGATCTACCTCATCGTCATTGGCATCAGCTGTGGCTTCCTGTCCCTGTATTATCTGTCC 2090 

T T GAAGCT CAT C AAAC AGAAGT C AAT T C AAGACT GGT GA 2019 

I II I I I I II II II II M II II I II I I II I I I I I M I I 

TT GAAGT T CAT C AAAC AGAAGT C AAT T C AAGAT T GGT GA 2129 
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AX685735 2669 bp 

Sequence 7 from Patent WO02081691. 
AX685735 

AX685735. 1 GI: 29371744 



DNA 



linear PAT 29-MAR-2003 



Chordata; Craniata; Vertebrata; Euteleostomi; 
Primates; Catarrhini; Hominidae; Homo. 



Homo sapiens (human) 
Homo sapiens 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
1 

Hobbs, H . H . , Shan,B., Barnes, R. and Tian,H. 

Abcg5 and abcg8 : compositions and methods of use 
Patent: WO 02081691-A 7 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 

Location/Qualifiers 



source 1. .2669 

/organism="Homo sapiens" 

/mol_type="unassigned DNA" 

/db_xref="taxon: 9606" 
CDS 100. .2121 

/note="unnamed protein product; human ABCG8 (hABCG8 ) " 

/codon_start=l 

/protein_id="CAD86573. 1" 

/db_xref="GI: 29371745" 

/ db_xr e f = " REMTREMBL : CAD 8 6573" 

/ trans lation="MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 
NTLEWDLNYQVDIASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 
RRVS IGVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRLVLI SLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQ 
EIATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 
PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 
LLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 
IIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSN^ 
YNSFY1AGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 
SGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 

ORIGIN 

Query Match 70.8%; Score 1430; DB 6; Length 2669; 

Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

AT GGCT GAGAAAAC CAAAGAAGAGAC CCAGCT GT GGAAT GGGACT GT ACT T CAGGAT GCT 6 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 159 

TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 120 

I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
TCGGGCCTC CAGGAT AGAT TGTTCTCCTCT GAAAGT GACAACAGC CT GT ACTT C ACCT AC 219 

AGT GGT CAGT CCAACACTCT GGAGGTCAGAGAT CTCACCTACCAGGTGGACATCGCCTCT 180 
I I I I I Ml I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
AGT GGC CAGC C CAACAC CCT GGAGGT CAGAGACCT CAACT ACCAGGT GGAC CT GGCCT CT 279 

CAGGT GC CT T GGTT TGAGCAGCT GGCT CAGT T CAAGAT AC C CT GGAGGTCT CAT AGC AGC 240 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml M 
CAGGT CCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CT AAGCT T CAAAGT GAGGAGT GGACAGAT G 300 

II I II I I I I I M I I II I I I II II I I I I I I I I I I I I M I I I I I I I I I I I I I I 

CAGAATT CTT GT GAGCTGGGCAT CCAGAACCTAAGCTTCAAAGT GAGAAGT GGGCAGATG 399 
CT GGCCAT CATAGGGAGCT CAGGCT GCGGGAGAGCCTCACTACT CGACGT GATCACAGGC 360 

I I I II I I I I I I I II I I M I M I I II I I I I I I I I I I I I II II I I I I I I I I III 

CT GGC CAT CATAGGGAGCT CAGGT T GT GGGAGAGCCT CCTT GCTAGATGTGAT CACT GGC 459 

AGAGGCCACGGT GGCAAGAT GAAAT CAGGACAAATTTGGATAAATGGGCAACCCAGTACG 420 

I M I I I I I I MINIM II I I I I I II II I I I I I I I I I I I I I I I I II II 
CGAGGT CAC GG C GGCAAGAT CAAGT C AGGC CAGAT CT GGAT CAATGGGC AGCCCAGCT CG 519 



Qy 


1 


Db 


100 


QY 


61 


Db 


160 


Qy 


121 


Db 


220 


Qy 


181 


Db 


280 


Qy 


241 


Db 


340 


Qy 


301 


Db 


400 


Qy 


361 


Db 


460 



Qy 



421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 480 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II II I II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 52 0 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 57 9 

Qy 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 54 0 

I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 580 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

Qy 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 640 CAGGCCCAGCGT GACAAAAGGGT GGAGGACGT GAT CGCGGAGCT GCGGCTT AGGCAGT GC 699 

Qy 601 GC CAAC AC CAGAGT GGGCAACAC GT AT GT ACGT GGG GTGT C C GGGGGT GAGC GC CGAC GA 660 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 7 00 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

Qy 661 GT GAGCATT GGGGT GCAGCT CCT GT GGAACCCAGGAATCCT CATT CT GGATGAACCCACT 720 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M MINIM 
Db 760 GT CAGC AT T GGGGT GCAGCT CCT GT GGAAC CCAGGAATC CT T ATT CT C GAC GAACC CACC 819 

Qy 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 820 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 87 9 

Qy 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I III 

Db 880 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

Qy 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

II I I I II I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I II I I I I I I I III 
Db 940 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

Qy 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

II I I I I I I I II I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

Qy 961 T ACGT GGACTT GACCAGCATCGACAGACGCAGCAAAGAACGGGAGGT GGCCACCGT GGAG 1020 

II MINI I I I I I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I I 
Db 1060 T ATGT GGACCT GACCAGC AT T GACAGGCGC AGC AGAGAGCAGGAATT GGC C AC CAGGGAG 1119 

Qy 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I 
Db 1120 AAGGCT CAGT C ACT CGC AGCC CT GT TT CT AGAAAAAGT GCGT GACTTAGAT GACT TT CT A 1179 

Qy 1081 T GGAAAGCT GAGGCAAAG GAACT CAACAC AAGCACC CACAC AGT C AGCCT GACCCT C ACA 1140 

I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I 

Db 1180 T GGAAAGCAGAGACGAAGGAT CTT GAC GAGGAC AC CT GT GT GGAAAGCAGC GT GAC C C CA 1239 

Qy 1141 CAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 1197 

I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I M I 

Db 1240 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

Qy 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I III 

Db 1300 ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 



Qy 



1258 



GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 



Db 



1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 



Qy 


1318 


Db 


1420 


Qy 


1378 


Db 


1480 


Qy 


1438 


Db 


1540 


Qy 


1498 


Db 


1600 


Qy 


1558 


Db 


1660 


Qy 


1618 


Db 


1720 


Qy 


1678 


Db 


1780 


Qy 


1738 


Db 


1840 


Qy 


1798 


Db 


1900 


Qy 


1858 


Db 


1960 


Qy 


1918 


Db 


2020 


Qy 


1978 


Db 


2080 



AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II II II I I I I I III 
ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

T T CAAT GT CAT C CT G GAT GT C GT CT C CAAAT GT C ACT C GGAGAGGT CAAT GCT GTACT AT 1437 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

TT CAAC GT CATT CT GGAT GT C ATCT C CAAAT GTT ACT C AGAGAGGGCAAT GCTTT ACT AT 1539 

GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 1497 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M 
GAACT GGAAGAC GGGCT GTACACCACT GGT C CAT ATT T CT TT GC CAAGAT C CT CGGGGAG 1599 

TT GC C GGAGCACT GT GC CT AC GTCAT CATCT ACGCGAT GC CC AT CTACT G GCT GACAAAC 1557 

I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I III 

CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Ml | | I I I I III I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

II I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I I I II I I I I I I I I I 
TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 177 9 

TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M 
TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

M I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I M 
AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 18 99 

TCGGGGCT GATGCAGATTCAATTTAAT GGACACCTTT ACACCACACAAAT CGGCAACTTC 1857 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

GAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTC 1959 

ACCTTCT CCAT CCTCGGAGACACGAT GATCAGTGCCAT GGACCT GAACT CGCAT CCACTC 1917 

I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AC CAT C GCGGT CT CAGGAGAT AAAAT CCT C AGT GC CAT GGAGCT GGACT C GTAC CCT CT C 2019 

TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

II || I I I I I I I I I I I I I I I I Ml I I I I I I I I I II II I II I I I I I I I I 
TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2079 

TCCTTGAAGCTCATCAAACAGAAGT CAATT CAAGACT GGTGA 2019 
I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
T CCT T AAGGTT CAT CAAACAGAAAC CAAGT CAAGACT G GT GA 2121 
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Homo sapiens ABCG8 (ABCG8) mRNA, complete cds . 
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Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2022) 

Berge,K.E., Tian,H., Graf, G. A. , Yu,L., Grishin, N . V. , Schultz,J., 

Kwiterovich, P. , Shan,B., Barnes, R. and Hobbs,H.H. 

Accumulation of Dietary Cholesterol in Sitosterolemia Caused by 

Mutations in Adjacent ABC Transporters 

Science (2001) In press 

2 (bases 1 to 2022) 

Berge,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N . V. , Schultz,J., 
Kwiterovich, P. , Shan,B., Barnes, R. and Hobbs , H . H . 
Direct Submission 

Submitted ( 09-NOV-2000 ) Molecular Genetics, University of Texas, 
Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., 
Dallas, TX 75390-9046, USA 

Location/Qualifiers 

1. .2022 

/organism= M Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon: 9606" 

1. .2022 

/gene="ABCG8" 

1. .2022 

/gene="ABCG8" 

/note="ATP-binding cassette, subfamily G, member 8" 

/codon_start=l 

/product="ABCG8" 

/protein_id="AAG40004 . 1" 

/db_xref="GI : 11692802" 

/translation— "MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 
NTLEVRDLNYQVDLASQVPWFEQLAQFKMPWT S P SCQNS CELGI QNLS FKVRS GQMLA 
1 1 GS S GCGRAS LLDVI TGRGHGGKI KSGQI WINGQP S S PQLVRKCVAHVRQHNQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRV^DVIAELRLRQCADTRVGNMYVRGLSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQ 
E^TREKAQSLAALFLEKWDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 
PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 
LLFMIG7VLIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 
IIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIM7\LAAAALLPTFHMASFFSNAL 
YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 
SGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 



ORIGIN 



Query Match 7 0.7%; 

Best Local Similarity 82.0%; 
Matches 1658; Conservative 



Score 1428.4; 
Pred. No. 0; 
0; Mismatches 



DB 9; Length 2022; 

361; Indels 3; Gaps 



1; 



Qy 

Db 



1 AT GGCT GAGAAAACCAAAGAAGAGACCCAGCT GT GGAAT GGGACT GT ACT T CAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

1 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 60 



Qy 

Db 



61 TCGGGCCTC CAGGACAGCT T GTT CT CCT C GGAAAGT GACAAC AGT CT GT ACTT C AC CT AC 120 

I I M II I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 TCGGGCCTC CAGGAT AGAT T GTT CT CCT CT GAAAGT GACAACAGCCT GT ACTT CAC CTAC 120 



Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 


Qy 


541 


Db 


541 


Qy 


601 


Db 


601 


Qy 


661 


Db 


661 


Qy 


721 


Db 


721 


Qy 


781 


Db 


781 


Qy 


841 


Db 


841 


Qy 


901 


Db 


901 



AGT GGT C AGT C CAACACT CT GGAGGT CAGAGAT CT CACCT ACC AGGT GGACAT C GCCT CT 180 

I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 180 

CAGGTGCCTTGGTTT GAGCAGCT GGCT CAGTT CAAGATACCCT GGAGGT CT CATAGCAGC 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml M 

CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 240 

CAAGACT C CT GT GAGCT G GGC AT CC GAAAT CTAAGCTT CAAAGT GAGGAGT GGACAGAT G 300 

II I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I 

C AGAAT T CT T GT GAG CT GGGCAT CC AGAAC CT AAGCT T CAAAGT GAGAAGT GGG CAGAT G 300 

CTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGC 360 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II II I I I I I I I I III 
CTGGCCAT CATAGGGAGCTCAGGTT GT GGGAGAGCCTCCTT GCTAGATGT GAT CACT GGC 360 

AGAGGCCACGGT GGCAAGAT GAAATCAGGACAAATTTGGATAAATGGGCAACCCAGT ACG 420 

I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I II I I I I I I I I I I I II 
C GAGGT CACGG C GGCAAGAT CAAGT CAGGC CAGAT CTGGAT CAAT GGGCAGC CCAGCT C G 420 

CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 4 8 0 

I I I I I I II I I I I I I I I I I I I II II M I I I I I I I I I I I I I I Mill I I I I II 
CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 4 80 

CT GAC CGT CAGAGAGAC CCTGGCTTT CAT T GCCC AGAT GCGC CT GCC C AGGAC CT T CT C C 54 0 

I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
T T GACT GT GCGAGAGAC CT T GGCCTT CAT T GCC CAGAT GC GGCT GCCCAGAAC CTT CT C C 540 

CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I Mill II I I I I I I I I I I II I I I I I 
CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 600 

GCCAACAC CAGAGTGGGCAACAC GT AT GT ACGT GGGGT GT CCGG GGGT GAGC GC CGAC GA 660 

II I I I I I I I I I I I I I I I I I II II I I M I I I I I I I I I I I I I II I I I II 
GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 660 

GT GAGC ATT GGGGT GCAGCT C CT GT GGAAC CCAGGAAT C CT CAT T CT GGAT GAACC CACT 720 

II I I I I II II I I I I I II I II I II I II II I I II I I I I I I I I I I II I M I I I I II II 
GTCAGCATT GGGGT GCAGCT CCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACC 720 

TCTGGCCTC GACAGCT T CACAGCC C ACAAT CT GGT GACAAC CT T GT C C C GC CT GGC CAAG 780 

Mill I I I II I II I I II II I I II I I I II I I II I I I I I I I I I I I I I I M I II II 

TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 7 80 



I I I I I I I I II I I I I I II I I I I II II II II I I II I I I II I I M I II I I I II II I II I II 



GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

I I I I II I I I I I I I I I I I I II II II II I II I II II I I II I I II I I Mill II I 

GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 900 

GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

II I I I I I MINI I II I III I I I I I I I I I I I II I I M II I I II II I I II I I 

GT C CAGT AT TT CAC AGC CAT CG GCT AC C C CTGT C CT C G CTACAGCAAT CCT GCTGACT T C 960 



Qy 


961 


Db 


961 


Qy 


1021 


Db 


1021 


Qy 


1081 


Db 


1081 


Qy 


1141 


Db 


1141 


Qy 


1198 


Db 


1201 


Qy 


1258 


Db 


1261 


Qy 


1318 


Db 


1321 


Qy 


1378 


Db 


1381 


Qy 


1438 


Db 


1441 


Qy 


1498 


Db 


1501 


Qy 


1558 


Db 


1561 


Qy 


1618 


Db 


1621 


Qy 


1678 


Db 


1681 


Qy 


1738 


Db 


1741 


Qy 


1798 



TACGT GGACTT GACCAGCATCGACAGACGCAGCAAAGAACGGGAGGT GGCCACC GT GGAG 1020 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III I I I I II I I I I I I 
T AT GT GGAC CT GAC CAGC ATT GAC AGGC GCAGCAGAGAGCAGGAAT T GGC CACCAGGGAG 1020 

AAGGCACAGT CT CT T GCAG CC CT GT T CCT AGAAAAAGT ACAAG GCTT TGAT GACT TT CT G 1080 

I I I M I I I I I I I I I I I I II I I I I II I I II I I I I I I I III I I I I I I M I I I 
AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTA 1080 

T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C CACACAGT C AGC CT GAC CCT CACA 1140 

I I II I I I I I I I I I I I I I I I II I I I I I I M Ml 

T GGAAAGCAGAGACGAAGGAT CTT GACGAGGACACCTGT GT GGAAAGCAGCGT GACCCCA 1140 

CAGGACACT GACT G T GGGACT GCT GT T GAGCT GCC C GGGAT GAT AGAGCAGT T T TC C 1197 

I | | M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1200 

ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

II I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I II I I I M I II I I I I I Ml 

ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 12 60 

GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1320 

AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II II I I I I I III 
ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1380 

TTCAAT GTCAT CCT GGATGTCGTCT CCAAAT GTCACT CGGAGAGGT CAATGCT GT ACTAT 1437 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
TTCAACGT CATTCTGGAT GT CATCT CCAAAT GTT ACT CAGAGAGGGCAATGCTTTACTAT 14 40 

GAGCT GGAAGACGGGCT GT ACACT G CT GGT C CTT AT T T CTTT GC CAAGAT C CT AGGAGAA 1497 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 
GAACT GGAAGACGGGCT GTACACCACT GGT C CAT ATTT CTT T GC CAAGAT CCT C GGGGAG 1500 

TT GC CGGAGC ACT GT GC CT AC GT C ATCAT CT AC GC GAT GC C CAT CTACT GGCT GACAAAC 1557 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I III 
CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1560 

CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1620 

TGCTGCAGGACCATGGCCCT GGCT GCCTCTGCCATGCTGCCCACCTTCCACATGT CCT CC 1677 

II I I I I I I I I I I I I I I I I I I III I III I I I I I I I I II I I I I I I I I I I I I I I 
TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1680 

TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

M M I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1740 

GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

II I I I I I I I I I I I I I I II I II I I I I I I I I I I I I II I I I I I I I I II II 
AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1800 

T C G GGGCT GAT GCAGAT T CAATT TAAT GGACAC CTTT ACAC CACACAAAT C GG CAACT T C 1857 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



M I I I I I I I I I I I I I I I I I M Ml I I I I I I I I I I I I 

1801 GAAGGGCT GAT GAAGATT CAGTT CAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCT C 1860 

1858 AC CTT CT C CAT C CT C GGAGAC AC GAT GAT CAGT GC CAT G GAC CT GAACT C GCAT C C ACT C ' 1917 

I I I I I I II I I I I I I II I I I I II I I I I I I I I I I II I I I I I I I I 
1861 AC CAT C GC GGT CT CAGGAGAT AAAAT CCT CAGT GT CAT GGAGCT GGACT C GT AC C CT CT C 192 0 

1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

M I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1921 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 1980 

1978 T CCT T GAAGCT CAT CAAACAGAAGT CAATT CAAGACT GGT GA 2019 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
1981 T C CTTAAGGTT CAT CAAACAGAAAC CAAGT CAAGACT GGT GA 2022 
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AF324494 2679 bp mRNA linear PRI 07-AUG-2001 

Homo sapiens sterolin-2 (ABCG8) mRNA, complete cds . 

AF324494 

AF3244 94. 1 GI: 15088539 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2679) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef ,A.F. , Mietinnen, T . , Bjorkhem,I., Bruckert,E., 
Pandya,A., Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 2679) 

Lu,K., Lee, M. -H . and Patel,S.B. 
Direct Submission 

Submitted (29-NOV-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
Street, STB541, Charleston, SC 29403, USA 

Location/Qualifiers 

1. .2679 

/organism="Homo sapiens" 
/mol_type= M mRNA M 
/db_xref="taxon: 9606" 
/ chromosome="2 " 

/map="2p21; between D2S2294 and D2S2298" 

/tissue_type="liver n 

1. .2679 

/gene="ABCG8" 

91. .2112 

/gene="ABCG8" 

/codon start=l 



/product="sterolin-2" 
/p r ot ein_id= "AAK8 4 0 7 8 . 1 " 
/db_xref="GI: 15088540" 

/translation="MAGKA7VEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 
NTLEVRDLNCQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 
1 1 GS S GCGRAS LLDVITGRGHGGKI KSGQIWINGQP S S PQLVRKCVAHVRQHNQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQ 
EIATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 
PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 
LLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 
IIYGMPTYWLANLRPGLQPFLLHFLLWLVVFCCRI^4ALAAAALLPTFHMASFFSN^ 
YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 
SGDKILSAMELDSYPLYAIYLIVIGLSGGFMVXYYVSLRFIKQKPSQDW" 



ORIGIN 



Query Match 70.7%; Score 1428.4; DB 9; Length 2679; 

Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1658; Conservative 0; Mismatches 361; Indels 3; Gaps 1; 

Qy 1 AT GG CTGAGAAAACCAAAGAAGAGAC C CAGCT GTGGAAT GGGACT GT ACT TCAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINI I 

Db 91 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 150 

Qy 61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 12 0 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 151 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 210 

Qy 121 AGT GGT C AGT CCAACACT CT GGAG GT CAGAGAT CT CAC CTAC CAGGT GGACAT CGC CTCT 18 0 

I I I I I III I I I I I I I I II I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I 
Db 211 AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTGCCAGGTGGACCTGGCCTCT 270 

Qy 181 CAG GT GC CTT GGTT T GAGCAG CT GGCT CAGT T CAAGAT ACC CT GGAGGT CT CAT AGCAGC 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I III II 

Db 271 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 330 

Qy 241 CAAGACTCCT GT GAGCT GGGCATCCGAAATCTAAGCTT CAAAGT GAGGAGTGGACAGATG 300 

II I M I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I II I I I I 

Db 331 CAGAATTCTT GT GAGCTGGGCATCCAGAACCTAAGCTTCAAAGT GAGAAGT GGGCAGAT G 390 

Qy 301 CT GGCCAT CAT AG G GAGCT CAGGCT GCGGGAGAGC CT CACT ACT C GAC GTGAT CACAGGC 360 

I I I II I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I II II I I I I I I I I III 
Db 391 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 450 

Qy 361 AGAGGC CACGGT GGCAAGAT GAAAT CAG GACAAATTT GGAT AAAT GGGCAAC C CAGT AC G 420 

I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I I I I I I II I I I II II 
Db 451 C GAGGT CACG GCGGCAAGAT CAAGT CAGGCC AGAT CT GGAT CAAT GGGCAGC C CAGCT C G 510 

Qy 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 480 

I I I I II I I I I II I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 511 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 570 

Qy 4 81 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 54 0 

I I II II I I I I I I I I I I I I t t I I I I M I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 571 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 630 



Qy 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 631 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 690 

Qy 601 GC CAACAC CAGAGTGG GCAACAC GT AT GT ACGT GG GGT GT CC GGGGGT GAGC GC C GACGA 660 

II I I I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 691 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 750 

Qy 661 GT GAGCAT T GGGGT GCAGCT CCT GT G GAAC CC AGGAAT C CT CAT T CT GGAT GAAC C CACT 720 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 751 GT CAGCAT T GGGGT GCAGCT C CT GT G GAAC CC AGGAAT CCTT AT T CT CGACGAAC C CAC C 810 

Qy 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 78 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 111111111 
Db 811 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 87 0 

Qy 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M III 

Db 871 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 930 

Qy 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 931 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 990 

Qy 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

II I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 991 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1050 

Qy 961 T ACGT GGACTT GACCAGCATCGACAGACGCAGCAAAGAACGGGAGGT GGCCACCGT GGAG 1020 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I I 
Db 1051 TATGTGGACCT GACCAGCATT GACAGGCGCAGCAGAGAGCAGGAATT GGCCACCAGGGAG 1110 

Qy 1021 AAGGCAC AGT CT CT T G CAGCC CT GT T CCT AGAAAAAGT ACAAGGCT TT GAT GACTT T CT G 1080 

I I I I I I I I I I II I I I It I I II I I I I I I I I I I II I I III I I I I I I I I I I I 
Db 1111 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTA 1170 

Qy 1081 T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C CAC AC AGT CAGC CT GACCCT CACA 1140 

I I I I I I I I I I I I I I I I I I I II I I I I I Ml Ml 

Db 1171 T GGAAAGCAGAGAC GAAGGAT CT T GACGAGGACAC CTGT GT GGAAAGC AGC GT GACC C C A 1230 

Qy 1141 C AGGACACT GACT G TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 1197 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

Db 1231 CTAGAC AC CAACT GC CT C C CGAGT C CTAC GAAGAT GC CT GGGGCGGT GC AGC AGT T T AC G 1290 

Qy 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I III 

Db 1291 ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1350 

Qy 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 1351 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1410 

Qy 1318 AAGCAGCT CT CCTT CAT GGACACAGC AGCCCT CCT CTT CAT GATAGGGGCGCTCATT CCT 1377 

I II I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I III 
Db 1411 AT C C AGCT CT C CT T CAT GGAT ACAG C C GC CCT CTT GT T CAT GAT C GGT GCT CT CAT CC CT 1470 



QY 



1378 T T CAAT GT CAT CCT GGAT GT C GT CT C CAAAT GT CACT C GGAGAGGT CAAT GCT GT ACT AT 1437 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I MINI 

Db 1471 TT CAAC GT C ATT CT GGAT GT CAT CT CCAAAT GT TACT CAGAGAGGGCAAT GCT T T ACTAT 1530 

Qy 1438 GAGCT GGAAGAC GGGCT GT ACACT GCT GGT CCTT AT T T CT T T GC CAAGAT C CT AG GAGAA 1497 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 

Db 1531 GAACT GGAAGAC GGGCT GTACAC CACT GGT CCAT AT T T CT T T GCCAAGAT C CT C GG GGAG 1590 

Qy 1498 TT GCC GGAGCACT GT G C CTACGT CAT CAT CT AC GCGAT GC C CAT CT ACT GGCT GACAAAC 1557 

I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 1591 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1650 

Qy 1558 CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Ml I I I I I I III I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1651 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1710 

Qy 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

II I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1711 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1770 

Qy 1678. TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

I I I I II I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1771 TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1830 

Qy 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1831 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 18 90 

Qy 1798 T CGGGGCT GAT GCAGATT CAAT T TAAT GGACACCT T T ACACCACACAAAT C GGCAACTT C 1857 

I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I 

Db 1891 GAAGGGCT GAT GAAGATT CAGTT CAGCAGAAGAACTT AT AAAAT GC CTCT C GGGAAC CT C 1950 

Qy 1858 AC CT T CT C CAT C CT CGGAGACACGAT GAT CAGT GC C AT GGAC CT GAACT C GCAT CC ACT C 1917 

M I I I I II I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I II 
Db 1951 AC CAT C GC GGT CT CAGGAGAT AAAAT CCT CAGT GC CAT GGAGCT GGACT C GTAC CCT CT C 2010 

Qy 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 2011 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2070 

Qy 197 8 T C CT T GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 2019 

Mill | | I I I I I I I I I I I II II I I I I I I I I I I I I I I 
Db 2071 T C CTT AAGGTT C AT CAAACAGAAAC CAAGT CAAGACT GGT GA 2112 
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AX478099 
Sequence 
AX478099 
AX478099, 



29 



3239 bp 
from Patent WO0240541. 

GI:22217059 



DNA 



linear PAT 12-AUG-2002 



( human ) 



Homo sapiens 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 

Tang,Y.T., Yue,H. f Nguyen, D.B. 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 

, Hafalia, A. J. , Elliott, V. S . , Lu,Y., 



TITLE 
JOURNAL 

FEATURES 

source 



ORIGIN 



Walia,N.K., Yao,M.G., Baughn,M.R., Gandhi, A. R. , Ding,L., 
Sanjanwala,M. , Ramkumar,J., Arvizu,C, Gietzen, K. J. , Lal,P.G. 
Azimzai,Y., Khan, F. A., Thangavelu, K. , Thornton, M. , Lu,D.A., 
Tribouley, CM. , Warren, B . A. , Ison,C.H., Das, D. , Raumann, B. E. , 
Policky,J.L. and Kearney, L. 
Transporters and ion channels 
Patent: WO 0240541-A 29 23-MAY-2002; 
Incyte Genomics, Inc. (US) 

Location/Qualifiers 

1. .3239 

/organism="Homo sapiens" 
/mol_type=" unas signed DNA" 
/db_xref="taxon: 9606" 
/note="Incyte ID No: 6585710CB1" 



Query Match 36.8%; Score 743.8; DB 6; Length 3239; 

Best Local Similarity 78.9%; Pred. No. le-160; 

Matches 899; Conservative 0; Mismatches 237; Indels 3; 



Gaps 



l; 



Qy 



Db 



884 GGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATA 943 

III I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
12 GGGGCGGCCAGCACATGGTCCATTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 71 



Qy 



Db 



944 GCAAC C CT GC GGACTT CT AC GT GGACT T GACC AGCAT CGACAGACGCAGCAAAGAAC GGG 1003 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I III I II 
72 GCAAT CCT GCT GACTT CT ATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 131 



Qy 

Db 

Qy 

Db 

Qy 

Db 
Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



1004 AGGT GGC CAC C GT GGAGAAGGCACAGT CT CTT GC AGCCCT GT T C CT AGAAAAAGT ACAAG 1063 
I I I I I I I I I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I i I I I I 1 I I I 
132 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 191 

1064 GCTTT GATGACTTTCT GT GGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAG 1123 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II II I I I 

192 ACTT AGAT GACT T TCT AT GGAAAGCAGAGACGAAGGAT CT T GAC GAGGACAC CT GT GT GG 251 

1124 T C AGC CT GACC CT C ACAC AGGACACT GACT G TGGGACTGCTGTTGAGCTGCCCGGGA 1180 

Ml I I I I I II I I I II I I I I I I I I I I I I I I I 

252 AAAGCAGC GT GAC C CCACT AGACAC CAACT GC CT C C C GAGTCCT AC GAAGAT GC CT GGGG 



311 



1181 T GAT AGAGCAGT T TT C CAC CCT GAT C CGT C GT CAGATTT C CAAT GACTT C CGGGACCTGC 1240 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
312 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 371 

1241 CCACGCTGCTCATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTT 1300 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I 

372 C CAC C CT C CT CAT CCATGGGGCGGAGGCCTGTCT GAT GT CAAT GAC CAT CGGCTT CCT CT 431 

1301 ACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGA 1360 

I I I I II I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
432 AT TT T GGC CAT GGGAGC AT CCAGCT CT C CT T C AT GGAT ACAGC CGC C CT CTT GT T CAT GA 491 

1361 TAGGGGCGCTCATTCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGA 1420 
I II II I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I 
4 92 T C GGT GCT CTC AT C CCT T T CAACGT CAT TCT GGAT GT C AT CT C CAAAT GT TACT C AGAGA 551 



Qy 



1421 GGTCAATGCTGTACTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTG 14 80 



II 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 

Db 552 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 611 

Qy 14 81 CCAAGAT C CT AGGAGAAT T GC CGGAGCACT GT GC CT AC GT CAT C AT CT ACGC GAT GCC CA 154 0 

I M I I I I I I I II II I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I 
Db 612 CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 671 

Qy 1541 TCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCG 1600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 672 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 731 

Qy 1601 TGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCA 1660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III I I I I I I I I 

Db 732 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 791 

Qy 1661 CCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCG 1720 

I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 792 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 851 

Qy 1721 GCT T CAT GATAAACTT GGACAACCT GT GGAT AGT GCCT GC AT GGAT CT C CAAGCT GT CGT 1780 

I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 852 GCT T CAT GATAAACTT GAGCAGC CTGT GGACAGT GCC C GCGT GGATT T C CAAAGT GT C CT 911 

Qy 1781 TCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCA 1840 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I 

Db 912 TCCTGCGGTGGTGTTTT GAAGGGC T GAT GAAGAT T CAGTT C AGCAGAAGAACTT ATAAAA 971 

Qy 1841 CACAAAT CGGCAACTT CACCTTCT CCAT CCT CGGAGACACGAT GATCAGTGCCAT GGACC 1900 

I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I II I 
Db 972 TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 1031 

Qy 1901 TGAACT CGC AT C C ACT CT AT GCGAT CT AC CT CAT T GT CAT C GGC AT C AGCTAC GGCT T C C 1960 

II I I I I I I II I I I I I II I I I I I I I I I I I I I I I I III I I I I I II I I I I 

Db 1032 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 1091 

Qy 1961 T GT T CCT GT ACT AT CT AT C CT T GAAGCT CAT CAAAC AGAAGT CAATT CAAGACT GGT GA 2019 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I II I I I I I I I I 
Db 1092 T GGTCCT GT ACTACGTGTCCTTAAGGTTCAT CAAACAGAAACCAAGTCAAGACT GGTGA 1150 



RESULT 10 

AC122243/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



AC122243 204584 bp DNA linear ROD 04-NOV-2003 

Mus musculus chromosome 17 clone RP23-148C10, complete sequence. 
AC122243 

AC122243.3 GI: 38154054 
HTG. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 204584) 
Wilson, R. K. 

The sequence of Mus musculus clone 
Unpublished 

2 (bases 1 to 204584) 
McPherson, J.D. and Waterston, R. H . 



TITLE Direct Submission 

JOURNAL Submitted (23-MAY-2002) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
REFERENCE 3 (bases 1 to 204584) 
AUTHORS Wilson, R.K. 
TITLE Direct Submission 

JOURNAL Submitted ( 06-SEP-2003 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
REFERENCE 4 (bases 1 to 204584) 
AUTHORS Wi 1 s on , R . K . 
TITLE Direct Submission 

JOURNAL Submitted ( 04-NOV-2003 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
COMMENT On Nov 4, 2003 this sequence version replaced gi: 34495085. 

Genome Center 

Center: Washington University Genome Sequencing Center 

Center code: WUGSC 

Web site: http://genome.wus tl.edu 

Contact: submissions@watson . wustl . edu 

Project Information 

Center project name: M_BA0148C10 

Location/Qualifiers 
1. .204584 

/organism="Mus mus cuius" 
/mol_type- fI genomic DNA" 
/db_xref="taxon: 10090" 
/ chromosome ="17" 
/clone= ,, RP23-148C10" 



FEATURES 



ORIGIN 



Query Match 15. 0%; 

Best Local Similarity 80.1%; 
Matches 411; Conservative 



Score 302.2; DB 10; 
Pred. No. 9.4e-59; 
0; Mismatches 8; Indels 



Length 204584; 

94; Gaps 



l; 



Qy 559 CGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGC 618 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 204178 CAGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGC 

204119 

Qy 619 AACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAG 67 8 

I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II 
Db 204118 AACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAG 

204059 



Qy 679 CTCCTGTGGAA 

I I II I I I I I I I 

Db 204 058 CT C CT GT GGAACC CAG GT GAG GC CT GGGAAC CT GAGGGGT GAAAACCT GAGCCT ACAAC C 

203999 



689 



QY 



690 CCCAGGAATCCTCAT 704 

I I I I I I I I I I I I I I I 

Db 203998 TGTCCGGCAGCGGCAGCGTGGTCATTGGACTCCCTGTGCAATATCCCCAGGAATCCTCAT 

203939 



Qy 



705 TCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTT 764 



1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 II 1 1 1 1 1 1 1 II II 1 1 

Db 203938 TCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTT 

203879 

Qy 765 GTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGA 824 

II I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I II I I I I I M I I 
Db 203878 GTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGA 

203819 

Qy 825 CATCTTCAGGCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGG 8 84 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 203818 C ATCT T C AGGCT AT T T GAC CT GGT C CTT CT GAT GAC AT CT GGCAC C C CT ATCT AC CT GG G 

203759 

Qy 885 GGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAG 944 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I 
Db 203758 GGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAG 

203699 

Qy 945 CAAC C CT GCGGACT T CT AC GT GGACTT GAC CAG 977 

I I I II I I I I I I II I I I I II I I I I I M 
Db 203698 CAAC CCTGCGGACTTCTAC GGT GAGT GGT AAAG 203666 



RESULT 11 

F351799S06 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



ROD 23-AUG-2002 



F351799S06 1387 bp DNA linear 

Mus musculus sterolin 2 (Abcg8) gene, exon 6. 
AF351804 

AF351804. 1 GI: 18 996442 



6 of 13 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (bases 1 to 1387) 

Lu,K., Lee,M.-H., Yu,H., Zhou,Y., Sandell, S .A. , Salen,G. and 
Patel, S.B. 

Molecular cloning, genomic organization, genetic variations, and 

characterization of murine sterolin genes Abcg5 and Abcg8 

J. Lipid Res. 43 (4), 565-578 (2002) 

21904563 

11907139 

2 (bases 1 to 1387) 

Lu,K., Zhou,Y., Lee,M.-H. and Patel, S.B. 
Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St., STB 541, Charleston, SC 29403, USA 

Location/Qualif iers 

1. .1387 

/organism="Mus musculus" 
/mol__type= "genomic DNA" 
/strain="129/Sv" 
/db_xref-"taxon: 10090" 
/ chromosome^" 17 " 



/map="between. Mit41 and Mitl89" 
/clone="329Bll" 
exon 57. .326 

/gene="Abcg8" 
/ number=6 

ORIGIN 

Query Match 13.7%; Score 275.8; DB 10; Length 1387; 

Best Local Similarity 97.2%; . Pred. No. 9.7e-53; 

Matches 280; Conservative 0; Mismatches 8; Indels 0; Gaps 0; 

Qy 690 CC CAGGAAT CCT C ATT CT GGAT GAACC C ACT T CT GGC CTCGACAGCTT CAC AGC CCACAA 74 9 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 52 CC CAGGAAT CCT CAT T CT GGAT GAAC C CACT T CT GGC CT CGACAG CT T CACAGC C CACAA 111 

Qy 750 TCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCA 809 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 112 TCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCA 171 

Qy 810 CCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGACATCTGGCAC 869 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 172 CCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGACNTCTGGCAC 231 

Qy 870 CCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCC 92 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 232 CCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCC 291 

Qy 930 TTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAG 977 

I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I II 
Db 292 TTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGGTGAGTGGTAAAG 339 



RESULT 12 

F351799S11 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



ROD 23-AUG-2002 



F351799S11 1378 bp DNA linear 

Mus musculus sterolin 2 (Abcg8) gene, exon 11. 
AF351809 

AF3518 09. 1 GI : 18996447 



11 of 13 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 1378) 

Lu,K., Lee,M.-H., Yu,H., Zhou,Y., Sandell, S . A. , Salen,G. and 
Patel,S.B. 

Molecular cloning, genomic organization, genetic variations, and 

characterization of murine sterolin genes Abcg5 and Abcg8 

J. Lipid Res. 43 (4), 565-578 (2002) 

21904563 

11907139 

2 (bases 1 to 1378) 

Lu,K., Zhou,Y., Lee, M. -H . and Patel,S.B. 
Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 



St. 



FEATURES 

source 



exon 



STB 541, Charleston, SC 29403, USA 
Location/Qualifiers 
1. .1378 

/organism="Mus musculus" 
/mol m type=" genomic DNA" 
/strain= n 129/Sv" 
/db_xref="taxon: 10090" 
/ chromosome="17" 

/map="between Mit41 and Mitl89" 

/clone="329Bll" 

415. .682 

/gene="Abcg8" 

/ number=ll 



ORIGIN 



Query Match 13.4%; 
Best Local Similarity 88.3%; 
Matches 2 94; Conservative 



Score 270.6; DB 10; 
Pred. No. 1.5e-51; 
0; Mismatches 39; 



Length 1378; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



14 84 AGAT CCTAGGAGAATTGCCGGAGCACTGT GCCTACGTCATCAT CTACGCGAT GCCCAT CT 1543 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
413 AGAT C CT AGGAGAATT GC C GGAGC ACT GT GC CTAC GT CAT CAT CTACGCGAT GCCCAT CT 472 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1544 ACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGT 1603 
M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
473 ACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGT 532 

1604 GGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCT 1663 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
533 GGTTGGTGGTCTTCTGCTGCAGGAACATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCT 592 

1664 TCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCT 1723 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
593 TCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCT 652 

1724 T CAT GATAAACTT GGACAAC CT GT GGAT AGT GCCT GCAT GGAT CT C CAAGCT GT C GTT C C 1783 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 

653 TCAT GATAAACTT GGACAACCTGT GGAT AGGTGAGGCCTGCTGCCCCACCCCCCGCCCCC 712 



Qy 



Db 



17 84 TCCGGTGGTGCTTCTCGGGGCTGATGCAGATTC 1816 

I I I I I I III II III 

713 CTTAGCCAAGCGTCTGTAGGCCTCTGTGGCTGC 745 



RESULT 13 

AC120701 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AC120701 237445 bp DNA linear HTG 21-SEP-2002 

Rattus norvegicus clone CH230-65H6, *** SEQUENCING IN PROGRESS 
4 unordered pieces . 
AC120701 

AC120701. 4 GI: 23265381 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 



REFERENCE 1 (bases 1 to 237445) 

AUTHORS Muzny, D.Marie. , Metzker,M. Lee . , Abramzon,S. , Adams , C, Alder, J. , 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D., Bandaranaike, D . , Barber, M. , Barnstead,M. , Benahmed,F., 
Biswalo,K., Blair, J., Blankenburg, K. , Blyth,P., Brown, M. , 
Bryant,N., Buhay,C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A. , 
Chacko,J., Chavez, D., Chen,G., Chen,R., Chen,Y., Chen f Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M. , Cree,A., D'Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll, L. , DeAnda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A. , Durbin,K., Duval, B., Eaves , K. , 
Egan,A., Escotto,M. , Eugene, C, Evans, C. A., Falls,T., Fan,G., 
Fernandez, S. , Finley,M., Flagg,N., Forbes, L., Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A., Ganta,R., Garcia, A., Garner, T., Garza, M. , 
Gebregeorgis,E. , Geer,K., Gill,R., Grady, M. , Guerra,W., Guevara, W. , 
Gunaratne, P. , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K., 
Harvey, Y., Havlak,P., Hawes,A. , Henderson, N . , Hernandez, J. , 
Hernandez, R. , Hines,S., Hladun,S.L., Hodgson, A., Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J., Idlebird,D., Jackson, A. , 
Jackson, L., Jacob, L., Jiang, H., Johnson, B. , Johnson, R. , Jolivet,A., 
Karpathy,S., Kelly, S., Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J. , 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne,M. , Mahmoud,M., Malloy,K., Mangum,A. , 
Mangum,B., Mapua,P., Martin, K., Martin, R. , Martinez, E., 
Mawhiney,S., McLeod,M.P., McNeill, T . Z . , Meenen,E., 
Milosavljevic,A. , Miner, G. , Minja,E., Montemayor, J. , Moore, S., 
Morgan, M. , Morris, K., Morris, S., Munidasa,M., Murphy, M., Nair,L., 
Nankervis, C. , Neal,D., Newton, N., Nguyen, N., Norris,S., 
Nwaokelemeh,0. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K. , 
Pasternak, S. , Paul,H., Perez, A., Perez, L., Pf annkoch, C. , 
Plopper,F., Poindexter,A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K. , Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S. J., 
Sanders', W. , Savery,G., Scherer,S., Scott, G., Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn,A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A., Sodergren, E. , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M., Strong, R. , Sutton, A. , Svatek,A. , Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A. , Trejos,Z., Usmani,K., 
Valas,R., Vera, V., Villasana, D . , Waldron,L., Walker, B . , Wang, J . , 
Wang,Q., Wang,S., Warren, J. , Warren, R. , Wei,X., White, F., 
Williams, G., Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D., Wright, R. , Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J., Zhou, J. , Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern,A. , Weiss, R. , Smith, D. R. , Holt, R. A., Smith, H . 0 . , 
Weinstock,G. and Gibbs,R.A. 

TITLE Direct Submission 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 237445) 

AUTHORS Worley,K.C. 

TITLE Direct Submission 

JOURNAL Submitted ( 09-MAY-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Baylor Plaza, Houston, TX 77030, USA 
3 (bases 1 to 237445) 
Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted (21-SEP-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 21, 2002 this sequence version replaced gi:21908396. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequening reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). As a result, the 
sequence may extend beyond the ends of the clone and there may be 
contigs that consist entirely of whole genome shotgun sequence 
reads . Both end sequences and whole genome shotgun sequence only 
contigs will be indicated in the feature table. 
Genome Center 

Center: Baylor College of Medicine 

Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact : hgsc-help@bcm. tmc . edu 
Project Information 

Center project name: GXQV 

Center clone name: CH230-65H6 
Summary Statistics 

Assembly program: Phrap; version 0.990329 

Consensus quality: 209781 bases at least Q40 

Consensus quality: 213033 bases at least Q30 

Consensus quality: 214997 bases at least Q20 

Estimated insert size: 233017; sum-of-contigs estimation 

Quality coverage: 4x in Q20 bases; sum-of-contigs estimation 



FEATURES 

source 



raise feature 



NOTE: Estimated insert size may differ from sequence length 

(see http : //www. hgsc . bcm. tmc . edu/docs/Genbank_draf t_data . html ) 
NOTE: This is a 'working draft 1 sequence. It currently 
consists of 4 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 233866: contig of 233866 bp in length 

233867 233966: gap of unknown length 

233967 235011: contig of 1045 bp in length 

235012 235111: gap of unknown length 

235112 236137: contig of 1026 bp in length 

236138 236237: gap of unknown length 

236238 237445: contig of 1208 bp in length. 
Location/Qualifiers 
1. .237445 

/organism="Rattus norvegicus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 10116" 
/ cl one= " CH2 30-65H6" 
1. .1326 

/note="wgs_end_extension 
clone end:T7" 



misc_feature 8065. .8944 

/note="clone_boundary 

clone_end:T7 

site:EcoRI 

end_s equence : BH3 5 0 8 1 3 " 
misc_feature complement (232953. .233569) 
/note=" clone_boundary 
clone_end: Sp6 
site:EcoRI 

end_sequence:BH350815" 

ORIGIN 

Query Match 13.1%; Score 264.8; DB 2; Length 237445; 

Best Local Similarity 76.3%; Pred. No. 4e-50; 

Matches 380; Conservative 0; Mismatches 32; Indels 86; Gaps 1; 

Qy 559 CGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGC 618 

| I I I I I I I I I I I I II II I I I I I I I I I.I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 146962 CAGGTGGAAGACGTGATTGCGGAGCTGCGGCTGCGGCAGTGCGCCAACACCCGCGTGGGC 

147021 

Qy 619 AACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAG 67 8 

| | | I I || Mill I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I 
Db 147022 AACACATACGTACGCGGGGTGTCCGGGGGCGAGCGCCGAAGAGTGAGCATCGGGGTGCAG 

147081 

Qy 679 CTCCTGTGGAA 689 

I I I I I I I I I II 

Db 147082 CTCCTGTGGAACCCAGGTGAGGCCTGGGAACCTGAGGGGCGAGGACCTGAGCCTACAACC 

147141 

Qy 690 C CCAGGAAT C CT CAT T CT GGAT G 712 

I I I I I I I II I I I I I I II I I I I I 

Db 147142 TGTCCGGCGTGGTCACTGGGCTCCCTGTGCGATACCCCCCAGGAATCCTCATCCTGGATG 

147201 

Qy 713 AACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCC 772 

I I I I I I I I I I I I I I I I I I I I I I I HI I II I II I I MINI III I I II I I I I I I 
Db 147202 AAC C CACTT C CGGC CT C GACAGCT TCAC CGCT CACAAC CT GGT GAGAACT TT GT C CC GC C 

147261 

Qy 773 TGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCA 832 

M I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 1 I 
Db 147262 T GGCCAAAGGCAACAGGCTGGT GCT CAT CTCCCT CCACCAGCCT CGCT CT GACAT CTT CA 

147321 

Qy 833 GGCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGC 892 

I M I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I 
Db 147322 GGCTATTTGACCTGGTCCTTCTGATGACGTCTGGCACCCCTATCTACCTGGGGGTGGCAC 

147381 

Qy 893 AGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTG 952 

| I M I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I 

Db 147382 AGCACATGGTGCAGTACTTTACATCAATTGGCTACCCTTGTCCTCGCTACAGCAACCCTG 

147441 



Qy 953 CGGACTTCTACGTGGACT 970 



I 1 1 1 1 1 1 1 1 1 1 III 

Db 147442 CT GACT T CT AC GGT GAGT 147459 



RESULT 14 

AC112747/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC112747 312858 bp DNA linear HTG 08-OCT-2002 

Rattus norvegicus clone CH230-359E1, *** SEQUENCING IN PROGRESS 

8 unordered pieces. 
AC112747 

AC112747. 3 GI: 23270105 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRI CHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 312858) 

Muzny, D.Marie. , Metzker,M. Lee . , Abramzon,S., Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E. f Baden, H., 
Baldwin, D., Bandaranaike, D . , Barber,M., Barnstead,M. , Benahmed,F., 
Biswalo,K., Blair, J., Blankenburg, K. , Blyth,P., Brown, M., 
Bryant, N., Buhay,C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A., 
Chacko,J., Chavez,D., Chen,G*, Chen,R., Chen,Y., Chen,Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M., Cree,A. , D ! Souza,L., 
Davila,M.L-, Davis, C, Davy-Carroll, L. , DeAnda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A., Durbin,K., Duval, B., Eaves, K., 
Egan,A. , Escotto,M., Eugene, C, Evans, C . A. , Falls, T., Fan,G., 
Fernandez, S. , Finley,M., Flagg,N., Forbes, L., Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A. , Ganta,R., Garcia, A., Garner, T., Garza, M. , 
Gebregeorgis,E. , Geer,K., Gill,R., Grady, M. , Guerra,W., Guevara, W., 
Gunaratne,P. , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K. , 
Harvey, Y. , Havlak,P., Hawes,A., Henderson, N . , Hernandez, J . , 
Hernandez, R. , Hines,S., Hladun,S.L., Hodgson,A., Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J., Idlebird,D., Jackson, A. , 
Jackson, L., Jacob, L., Jiang, H., Johnson, B. , Johnson, R. , Jolivet,A. , 
Karpathy,S., Kelly, S., Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis , L. , Li,Z., Liu, J. , 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa,L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma,J. f 
Maheshwari,M. , Mahindartne,M. , Mahmoud,M., Malloy,K., Mangum,A., 
Mangum, B., Mapua,P., Martin, K., Martin, R. , Martinez, E., 
Mawhiney,S., McLeod,M.P., McNeill, T . Z . , Meenen,E., 
Milosavljevic,A. , Miner, G. , Minja,E., Montemayor , J. , Moore, S., 
Morgan, M., Morris, K. , Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis,C. , Neal,D., Newton, N., Nguyen, N., Norris,S., 
Nwaokelemeh,0. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K., 
Pasternak, S. , Paul,H., Perez, A. , Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter,A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K., Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S. J., 
Sanders, W., Savery,G., Scherer,S., Scott, G., Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn,A. , Sisson,!., Sitter, CD., Smajs,D., 



Sneed,A., Sodergren, E. , Song,X.-Z., Sorelle,R., Sosa, J. , 
Steimle,M., Strong, R. , Sutton, A., Svatek,A., Tabor, P., Taylor, C, 
Taylor, T., Thomas,N., Thomas, S., Tingey, A. , Trejos,Z., Usmani,K., 
Valas,R., Vera,V., Villasana, D . , Waldron,L., Walker, B . , Wang, J. , 
Wang,Q., Wang,S., Warren, J., Warren, R., Wei,X., White, F. , 
Williams, G. , Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D . , Wright, R. , Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J. , Zhou, J. , Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern,A. , Weiss, R. , Smith, D.R., Holt, R. A., Smith, H . 0 . , 
Weinstock,G. and Gibbs,R.A. 
Direct Submission 
Unpublished 

2 (bases 1 to 312858) 
Worley,K.C. 

Direct Submission 

Submitted (24-FEB-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 312858) 

Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted ( 08-OCT-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 23, 2002 this sequence version replaced gi:21738477. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequencing reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). Each contig described 
in the feature table below represents a scaffold in the Atlas 
assembly (a 1 contig-scaf f old 1 ) . Within each contig-scaf f old, 
individual sequence contigs are ordered and oriented, and separated 
by sized gaps filled with Ns to the estimated size. The sequence 
may extend beyond the ends of the clone and there may be sequence 
contigs within a contig-scaf fold that consist entirely of whole 
genome shotgun sequence reads. Both end sequences and whole genome 
shotgun sequence only contigs will be indicated in the feature 
table . 

Genome Center 

Center: Baylor College of Medicine 
Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact : hgsc-help@bcm. tmc . edu 
Project Information 

Center project name: GRAX 

Center clone name: CH230-359E1 
Summary Statistics 

Assembly program: Phrap; version 0.990329 

Consensus quality: 241372 bases at least Q40 

Consensus quality: 245333 bases at least Q30 

Consensus quality: 248022 bases at least Q20 

Estimated insert size: 276767; sum-of-contigs estimation 

Quality coverage: 4x in Q20 bases; sum-of-contigs estimation 



* NOTE: Estimated insert size may differ from sequence length 

* (see http : //www. hgsc . bcm. tmc . edu/docs/Genbank_draf t_data . html ) 

* NOTE: This sequence may represent more than one clone. 

* NOTE: This is a 'working draft' sequence. It currently 



consists of 8 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 





be preserved. 
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gap of 
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FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



Location/Qualifiers 
1. .312858 

/organism="Rattus norvegicus" 

/mol_type=" genomic DNA M 

/db_xref="taxon: 10116" 

/clone="CH230-359El" 

159838. .161520 

/note="wgs_contig" 

166727. .168287 

/note="wgs_contig" 

190162. .191648 

/note="wgs_contig" 

234118. .235251 

/note= ,I wgs_contig" 

290479. .292119 

/note= n wgs_contig" 



ORIGIN 



Query Match 13.1%; 
Best Local Similarity 76.3%; 
Matches 380; Conservative 



Score 264.8; DB 2; 
Pred. No. 4.1e-50; 
0; Mismatches 32; 



Length 312858; 
Indels 86; Gaps 



1; 



Qy 559 CGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGC 618 

I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 82228 CAGGTGGAAGACGTGATTGCGGAGCTGCGGCTGCGGCAGTGCGCCAACACCCGCGTGGGC 82169 



QY 



Db 



619 AAC ACGTAT GTAC GT GGG GT GT C C GGGGGT GAGC GCC GACGAGT GAGCATT GGGGT GCAG 67 8 
I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
82168 AACACATACGTACGCGGGGTGTCCGGGGGCGAGCGCCGAAGAGTGAGCATCGGGGTGCAG 82109 



Qy 



Db 



679 CTCCTGTGGAA 689 

I I I I I I II I I I 

82108 CT C CTGT GGAACC CAGGT GAGGC CT GGGAAC CT GAGGGGC GAGGACCT GAG C CT ACAACC 82049 



Qy 



690 



C CC AG GAAT C CT CAT T CT GGAT G 712 



Db 



82048 



81989 



Qy 


713 




ft 1 Qflfi 


Qy 


773 


Db 


81928 


Qy 


833 


Db 


81868 






Qy 


893 


Db 


81808 


Qy 


953 


Db 


81748 



AACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCC 772 

I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I IN MINIMI! 
AACCCACTTCCGGCCTCGACAGCTTCACCGCTCACAACCTGGTGAGAACTTTGTCCCGCC 8192 9 

TGGCC7^AGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCA 832 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

TGGCCAAAGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCA 81869 

GGCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGC 892 

I I I I M I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I II I I I I I I I I I I III I 

GGCTATTTGACCTGGTCCTTCTGATGACGTCTGGCACCCCTATCTACCTGGGGGTGGCAC 81809 



II I I I M I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I II I I I I II I I I I I I 



I I I I I I I I I I I III 



RESULT 15 

AY145899/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
mRNA 



AY145899 40929 bp DNA linear ROD 12-NOV-2002 

Rattus norvegicus sterolin 2 (Abcg8) and sterolin 1 (Abcg5) genes , 
complete cds . 
AY145899 

AY145899. 1 GI: 24 935208 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 40929) 

Yu,H., Lu, K. , Lee,M., Pandit, B. and Patel f s.B. 

The rat Abcg5 and Abcg8 : characterization, chromosomal assignment 

and genetic variation in sitosterolemic rats 

Unpublished 

2 (bases 1 to 40929) 

Yu,H., Lu,K., Lee,M., Pandit, B. and Patel,s.B. 
Direct Submission 

Submitted (29-AUG-2002 ) Endocrinology, Diabetes and Medical 
Genetics, Medical University of South Carolina, 114 Doughty Street, 
STR 541, Charleston, SC 29403, USA 
Location/Qualifiers 
1. .40929 

/organism="Rattus norvegicus" 
/ mo l_type=" genomic DNA" 
/strain= n Sprague-Dawley" 
/db_xref="taxon: 10116" 
complement (<4 136. ,>20831) 
/gene="Abcg8" 

complement (join (<4136. .4273,4361. .4488,5693. .5960, 



6513. .6589,6754. .6953,8189. .8269,8350. .8512,10772. 



11129. .11261,11647. .11885,15513. .15669,17473. .17574, 
20769. .>20831)) 
/gene="Abcg8 n 
/product="sterolin 2" 

complement (join (4136. .4273,4361. .4488,5693. .5960, 
6513. .6589,6754. .6953,8189. .8269,8350. .8512,10772. 

11129. .11261,11647. .11885,15513. .15669,17473. .17574, 

20769. .20831)) 

/gene="Abcg8 M 

/note="ATP-binding cassette sub-family G (WHITE) member 8" 
/ codon_start=l 
/product="sterolin 2" 
/protein_id="AAN64276. 1" 
/db_xref="GI: 24935210" 

/trans la tion= M MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQS 

NTLEWDLTYQVT)MASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLA 

IIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPN 

LTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRLRQCT^TTRVGNTYVRGVSGGER 

RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDI 

FRL FDLVLLMT S GT P I YLGVAQHMVQ Y FT S I GYP C P R YSN P AD FYVDLT S I DRRS KEQ 

EVATMEKARLLAALFLEKVQGFDDFLWKAEAKSLDTGTYAVSQTLTQDTNCGTAAELP 

GMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPLSFMDMAAL 

LFMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVI 

IYGMPIYWLTNLRPGPELFLLHFMLLWLWFCCRTMAIAASAMLPTFHMSSFCCNALY 

NSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVP 

GDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW" 

<21211. .>40564 

/gene= n Abcg5" 

join(<21211. .21356,21968. .22089,24726. .24862,24949. 

27388. .27520,28838. .28977,29879. .30008,30715. .30928, 
31032. .31237,32869. .33007,35821. .36006,38553. .38665, 
40371. .>40564) 
/gene="Abcg5" 
/product="sterolin 1" 

join(21211. .21356,21968. .22089,24726. .24862,24949. 

27388. .27520,28838. .28977,29879. .30008,30715. .30928, 
31032. .31237,32869. .33007,35821. .36006,38553. .38665, 
40371. .40564) 
/gene="Abcg5" 

/note="ATP-binding cassette sub-family G (WHITE) member 5 M 
/ codon_start=l 
/product="sterolin 1" 
/protein_id="AAN64275 . 1" 
/db_xref="GI: 24935209" 

/translation="MSELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNV 
S FSVSNRVGPWWNI KSCQQKWDRKI LKDVS LYI ESGQTMCI LGS S GS GKTTLLDAI SG 
RLRRTGTLEGEVFVNGCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSS 
SADFYDKKVEAVLTELSLSHVADQMI GNYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAI LTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSD 
ICHKILENIERTRHLKTLPMVPFKTKNPPGMFCKLGVLLRRVTRNLMRNKQWIMRLV 
QNLIMGLFLI FYLLRVQNNMLKGAVQDRVGLLYQLVGAT PYTGMLNAVNLFPMLRAVS 



DQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGMVQNPNIVNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPMCSMTQGIQFIEKTCPGATSRFTTN 
FLI LYS FI PTLVI LGMWFKVRDYLI S R" 

ORIGIN 



Query Match 13.1%; Score 264.2; DB 10; Length 40929; 

Best Local Similarity 68.2%; Pred. No. 5.2e-50; 

Matches 429; Conservative 0; Mismatches 113; Indels 87; Gaps 



Qy 


429 


GGT GAGGAAGT GC GT T GCGCAT GT GC GGCAGCAT GAC CAACT GCT GC C CAACCT GAC C GT 

1 1 M 1 1 I 1 1 1 1 1 III MM 1 1 1 1 

GGT GGGGGT GGGGGT GGGGGTGGGGGAGTTCGCCAAACAAT GCT GCAGGGAAAT GAAGT G 


488 


Db 


11394 


11335 


Qy 


489 


CAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCA 

1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 M 1 1 1 1 1 1 
CAGAAGAAGCCTTCCTTGGCATTAAGTGT^GAAGTTGCCCCCTGGACGCTCGTAATGCTC 


548 


Db 


11334 


11275 


Qy 


549 


GCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACAC 

I I I I I I 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 
AGCCTGCCCTCAGGTGGAAGACGTGATTGCGGAGCTGCGGCTGCGGCAGTGCGCCAACAC 


608 


Db 


11274 


11215 


Qy 


609 


C AGAGT GGGCAACAC GTAT GT AC GT GGGGT GT C C GGGGGT GAGCGC C GAC GAGT GAGCAT 

1 1 1 1 1 1 1 1 II INN 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 

CCGCGTGGGCAACACATACGTACGCGGGGTGTCCGGGGGCGAGCGCCGAAGAGTGAGCAT 


668 


Db 


11214 


11155 


Qy 


669 


1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 

CGGGGTGCAGCTCCTGTGGAACCCAGGTGAGGCCTGGGAACCTGAGGGGCGAGGACCTGA 


689 


Db 


11154 


11095 


Qy 


690 


1 1 1 1 1 II 1 1 1 1 1 

GCCTACAACCTGTCCGGCGTGGTCACTGGGCTTCCCTGTGCGATACCCCCCAGGAATCCT 


701 


Db 


11094 


11035 


Qy 


702 


C ATT CT GGATGAAC C CACTTCT GGCCT CGAC AGCTT CACAGC C CACAAT CTGGT GACAAC 
I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 M 
CATCCT GGAT GAAC C CACTT CC GGC CT C GACAGCT T CAC CGCT CACAAC CT GGTGAGAAC 


761 


Db 


11034 


10975 


Qy 

Db 


762 
10974 


CTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTC 

| | || 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 II 1 1 1 1 

TTTGT.CCCGCCTGGCCAAAGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTC 


821 
10915 


Qy 

Db 


822 
10914 


T GACATCT T C AGGCT ATT T GAC CT GGT C CTT CT GAT GACAT CT GGCAC C C CT ATCT AC CT 
|| | | | I I I I I I I I 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
T GACAT CT TCAGGCTAT T T GACCT GGT C CTT CTGAT GAC GT CT GG CAC CCCT AT CT AC CT 


881 
10855 


Qy 

Db 


882 
10854 


GGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTA 

1 M 1 1 III 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 
GGGGGT GGCAC AGC ACAT GGT GCAGT ACT TTACATCAATTGGCT AC CCTTGTCCT CGCT A 


941 
10795 


Qy 


942 


TAGCAAC C CT GCGGACT T CT ACGT GGACT 970 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 

CAGCAACCCT GCT GACTTCTAC GGT GAGT 10766 




Db 


10794 





Search completed: February 26, 2004, 06:21:11 
Job time : 5199.97 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 



February 26, 2004, 00:39:18 ; Search time 511.357 Seconds 

(without alignments) 
16773.223 Million cell updates/sec 

US-09-989-981A-3 
2019 

1 atggctgagaaaaccaaaga agtcaattcaagactggtga 2019 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3373863 seqs, 2124099041 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



6747726 



Database 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 



N_Geneseq_29 Jan04 : * 
geneseqnl980s : * 
geneseqnl990s : * 
geneseqn2000s : * 
geneseqn2001as : * 
geneseqn2001bs : * 
geneseqn2002s : * 
geneseqn2003as : * 
geneseqn2003bs : * 
geneseqn2003cs : * 
geneseqn2004s : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
AAD48881 

ID AAD48881 standard; DNA; 2019 BP. 
XX 

AC AAD48881; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG8 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 



KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
XX 
PA 
PA 
XX 
PI 
XX 
DR 
DR 
XX 
PT 
PT 
PT 
PT 
XX 
PS 
XX 

cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 

XX 
SQ 



ABCG5; gene; ds . 

Mus sp. 

Key 
CDS 



Location/Qualifiers 
1. .2019 
/*tag= a 

/product^ "mABCG8 protein" 

/ trans 1 except= (pos : 1318 . .1320, aa: Leu) 



WO200281691-A2. 



17-OCT-2002 . 

20-NOV-2001; 2001WO-US043823 . 

20-NOV-2000; 2000US-0252235P . 
28-NOV-2000; 2000US-0253645P . 

(TULA-) TULARIK INC. 
(TEXA ) UNIV TEXAS SYSTEM. 

Hobbs HH, Shan B, Barnes R, Tian H; 

WPI; 2003-058548/05. 
P-PSDB; AAE31703. 

New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 
related disorders e.g. sitosterolemia, hypercholesterolemia, 
hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 
nutritional deficiencies. 

Claim 13; Page 75; 94pp; English. 

The invention relates to ATP-binding cassette (ABC) family cholesterol 
transporter, ABCG8 polypeptides and polynucleotides. The invention also 
provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 
as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 
are useful for treating or preventing sterol-related disorders such as 
sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 
deficiency, atherosclerosis and nutritional deficiencies. They are also 
useful in gene therapy. The present sequence is mouse ABCG8 DNA 

Sequence 2019 BP; 444 A; 598 C; 510 G; 467 T; 0 U; 0 Other; 



Qy 

Db 

Qy 

Db 



Query Match 100.0%; Score 2019; DB 7; Length 2019; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2019; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

1 AT GGCT GAGAAAAC CAAAGAAGAGAC CC AGCT GT GGAAT GGGACT GT ACT T C AGGAT GCT 60 

I I I I I I i I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
1 AT GGCT GAGAAAAC CAAAGAAGAGAC C C AGCT GT GGAAT GGGACT GT ACTT C AGGATGCT 60 



61 



120 



TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 
I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
61 TCGGGCCTC C AG GAC AGCT T GT T CT C CT CG GAAAGT GACAACAGT CT GT ACT TCACCT AC 120 



A,, 

Qy 


191 


aptpptpaptppaapaptpTGPtAPPTPAPAPATPTPACCTACCAGGTGGACATCGCCTCT 

rtAJ L OO X V_-rtO X V^. V^-rtrtV^ rt\_^ 1^1 OVJ.fT.VjO X ^AUAUA X V_, X V-'-f»-V^ X rtV^ \_<.rt.O O X vj\jxi^n± v^vjuwx x 


180 




1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 




Db 


121 


AGTGGT CAGT C CAACACT CTGGAGGT CAGAGAT CT CAC CT AC CAG GT GGACAT C GC CTCT 


180 


Qy 


i ft 1 


p apptpppttpptttpappapptppptpaPtTTPAAGATACCCTGGAGGTCTCATAGCAGC 


240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


181 


CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 
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Z ft 1 


r* a i\ p a pt p p t pt p a pp t pppp a t p r* p a a a t pt a Ann tt p a a a pt p a pp A f^Tn PAPAGATG 

^rt/\o.rt.v> 1 ul uAwv- X v^ \^ Ortrtrt. L v., x rtrtov., x x v^rtrtrto x ort.vjort.vj x oort.v-^xaort.x vj 


300 




| | | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 
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CAAGACT CCT GT GAGCT GGGCAT C C GAAATCTAAGCTT CAAAGTGAGGAGT GGACAGAT G 
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p t r* r p p A t p A t a ppp A P p T P A PPP T PPPPP A C A PP C T C ACT ACT C GACGT GAT CAC AGGC 

\_, \_ Vjljv^k^rt.1 v_».r\X nuvvnu v_, x vnuub X Owooo-rtOrt.o^v_» x *> — c\\-- x av. x uAv v_j x wa x ^riv/nu w v^- 


360 
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ar*ar , r , r t r , ar*r , r , Tr , pp a apatpa A ATP APPAP A A ATTTPPAT A A ATPPPP A APPPAGTAPG 

nbnu'jL'^'nl'UUi out/nnonl unnnl V/nuonv^nnnl X X 00rt,Xrtrtrt.x ooov— .r\rt.^v^v^rt.o x rt.\_»o 


420 




| | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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AGAGGCCACGGT GGCAAGAT GAAAT CAGGACAAATT T GGAT AAAT GGGCAAC C CAGTAC G 


420 


Qy 


yi o i 


r , r*Tr , 7s L rT ,r r , rr r rrHirr A A ptp PPTT PPPP ATPT fiPnnr A PP AT PAP PA APT GC^GCC PA AC 
1 L*/\ov_- 1 oo 1 ortoortrto 1 bLo X X oVwO^rt.1 O X 'jv^uu^rtuLn l ort.v^v^rtrt\^ x ov_. x uv/V^nnv» 


480 




I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 


480 


Qy 




r**vr 7\rc , r T r i r*7s.r > 7\n\r appptppptttp ATTPPPP APATPPPPPTPPPPAPPAPPTTPTPP 
L- 1 o/VL-VvVj 1 U/\ort.o/\o/\L- L- 1 ooL- 111 v^/\l 1 uLL. v^rtort. X o^_-Ov_-v_. x uLLV/noont^ x x x v^v^ 


540 




I | | | | | | I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 
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v^rt.ooUUU.r\ov_.o 1 OrtV^rtrtrtA^oool uijnnunOo 1 nnl ivOv^^unUV/ x ov^oov^- 1 o ^ o o ^ jn.o x ov_- 


600 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 
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A,, 

Qy 


£pn 

DUl 


nrr a aPAPP A PAPTPPPP A APAPPT ATPT APPTPPPPTPTPPPPtPPiPTPiAGPGCCGACGA 

Ov^Vjrtrt.v^rt.vjV^rt.ort.0 1 Vjvjov^/Art.v_,rt.v_.o X rt.X ulriUuJ. oooo x o x \_»v_-ooooo x un<JV/V3VyV/Vjn^vj/i 


660 




| | | | | | | | I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


601 


GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 


660 


Qy 


ODl 


ptp a pp a TTrrrr'TPP a PPTPPTPTPP A APPPAPP A ATP PTP ATT PTPP» ATP A APP PACT 
Kj l (jrtuLn 1 1 oooo 1 b^ribL i v^uiul oOrtrt.v_, \^ \^isxj o.rArt. x v^ x v^rt x x v^ x oort. x unnv/ x 


720 






I | I I I I I || I I I I I I 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


661 


GT GAGC AT TGG GGT GC AGCT C CT GT GGAAC C CAGGAAT CCT CAT T CT GGAT GAAC C CACT 


720 


Qy 




rppTrrrrfrrrararrTTr AP A CZrnCTxr A AT PT PPT PAP A APPTTPTPPPPPPTGGP PA AG 

1 L- 1 bbbb 1 bbAbnub 1 X v^rt.Vjrtov^V^v^.rtv^.rtrt.X v_, x oo X O-rtv^rtrtV^v-. x A o x v—v^\_.ov_-v_. x oov^^rtn.o 


780 




| | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 II 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


721 


TCTGGCCTC GACAGCT T CACAGC CCACAAT CT GGT GACAAC CTT GT C C C GC CTGGC CAAG 


780 


Qy 


/OX 


ppp a ap appptpptpptpatptppptppaPPAPPPTPPPTPTGAPATCTTCAGGCTATTT 

OO^rtrt\^rt.oo^. X OO X ub X v_»rt.X V-« X \_v_»v_» X V_-rt.V_^ V^rVU X v^ov* iVyi ur\\u.rii \_- x x V/riuvj^- x n.i x x 


840 




1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 




Db 


781 


GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 


840 


Qy 


ft A 1 
0 41, 


PAPPTPPTPPTTPTPATPAPATPTGGPAPPPPTATPTACCTGGGGGCGGCGCAGCAAATG 

\Js\\_. x OO 1 Lb> 1 1 L/l Ort.X Ort,V_<rt.X V_- X UU^rt^V<V^U X A X v.* l/lVy\j X OOOOO^OVJV-* \j\~rj^i.\J\-r-r^r^r^.±. <-> 


900 




1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 I 1 1 1 M II 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 II 




Db 


841 


GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 


900 


Qy 


901 


GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


960 




I | | | | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


901 


GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 


960 


Qy 


961 


TACGT GGACTTGACCAGCAT CGACAGACGCAGCAAAGAACGGGAGGT GGCCACCGTGGAG 


1020 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I M I I I II I I I 

Db 961 TAC GT GGACT T GAC C AGCAT C GAC AGAC GCAG CAAAGAAC GGGAGGT GGCCAC C GT GGAG 102 0 

Qy 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

Qy 1081 T GGAAAG CT GAGGCAAAGGAACT CAAC ACAAGCACC CACACAGT CAGC CT GAC C CT CAC A 114 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 TGGAAAGCT GAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCT GACCCTCACA 1140 

Qy 1141 CAGGACACT GACT GT GGGACT GCT GT T GAGCT GCCC GGGAT GAT AGAGCAGT T TT C CAC C 1200 

I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 1141 C AG GACACT GACT GT GGGACT GCT GT T GAGCT GCC CGGGAT GATAGAGC AGT TT T C CAC C 1200 

Qy 1201 CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1201 CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 1260 

Qy 1261 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCT^AG 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 12 61 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 132 0 

Qy 1321 CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 1380 

I I I M I I II I I I I I I I I I I II I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 1321 CAG CT CT CCT T CAT GGACACAGCAGC C CT C CT CTT CAT GAT AGGGGC GCT CAT T C CTT T C 1380 

Qy 1381 AAT GT CAT C CT GGAT GT C GT CTC CAAAT GT CACT C GGAGAGGT CAAT GCT GT ACT AT GAG 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1381 AAT GT CAT C CTG GAT GT C GT CT C CAAATGT CACT C GGAGAGGT CAAT GCTGTACTAT GAG 144 0 

Qy 1441 CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1441 CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1500 

Qy 1501 CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTG 1560 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1501 CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTG 1560 

Qy 1561 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1561 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

Qy 1621 TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1621 TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

Qy 1681 TTCTGCAATGCCCTCTAC7\ACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1681 TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 1740 

Qy 1741 AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 18 00 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 1800 



Qy 



1801 GGGCT GAT GCAGAT T CAAT T TAAT GGACAC CTTT AC ACC AC ACAAAT C GGCAACTT C AC C 1860 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 



Db 


1801 


GGGCT GAT GC AGAT T CAAT TT AAT GGACACCTT T ACAC CACACAAAT C GGCAACT T CAC C 


1860 




1 ft 61 


TTCT CCAT CCT CGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCAT CCACT CTAT 


1920 




I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 




Db 


1861 


TTCT CCATCCTCGGAGACACGATGATCAGTGCCAT GGACCTGAACT CGCAT CCACT CTAT 


1920 




1921 


GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 


1980 




1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 
i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i < i ■ i 1 ■ ■ ' ' 




Db 


1921 


GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 


1980 


Qy 


1981 


TTGAAGCTCATCAAACAGAAGTCAATT CAAGACT GGTGA 2019 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1981 


T T GAAG CT CAT CAAAC AGAAGT CAATT CAAGACT GGT GA 2019 





RESULT 2 
ABN90022 

ID ABN90022 standard; cDNA; 2564 BP. 
XX 

AC ABN90022; 
XX 

DT 16-AUG-2002 (first entry) 
XX 

DE Mouse clone IMX3_67 extended sequence. 
XX 

KW Mouse; antiinflammatory; gene therapy; ileitis; DST; ss; TOGA; 

KW digital sequence tag; total gene expression analysis. 

XX 

OS Mus musculus. 
XX 

PN WO200231114-A2. 
XX 

PD 18-APR-2002. 
XX 

PF ll-OCT-2001; 2001WO-US032091 . 
XX 

PR ll-OCT-2000; 2000US-0239483P . 
XX 

PA (DIGI-) DIGITAL GENE TECHNOLOGIES INC. 
XX 

PI Viney JL, Sims JE, Dubose RF, Baum PR, Hasel KW, Hilbush BS; 
XX 

DR WPI; 2002-426279/45. 
XX 

PT New isolated nucleic acid molecules that are associated with ileitis, for 

PT preventing, treating, modulating and diagnosing ileitis in a mammalian 

PT subject. 
XX 

PS Claim 1; Page 266-268; 273pp; English. 
XX 

CC The invention relates to a novel isolated nucleic acid molecule 

CC comprising a polynucleotide having one of 90 polynucleotide sequences, 

CC given in the specification. The polynucleotides of the invention have 

CC antiinflammatory activity, and may have a use in gene therapy. The 

CC polynucleotide or a polypeptide encoded by it is used for preventing, 

CC treating, modulating or ameliorating a medical condition such as ileitis. 

CC The polypeptide or polynucleotide is also useful for manufacturing a 



CC medicament for treating ileitis. The sequence represents a an extended 

CC cDNA digital sequence tag obtained from a mouse clone by the TOGA (total 

CC gene expression analysis) method 
XX 

SQ Sequence 2564 BP; 623 A; 722 C; 638 G; 581 T; 0 U; 0 Other; 

Query Match 99.3%; Score 2004.4; DB 6; Length 2564; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 2018; Conservative 0; Mismatches 1; Indels 3; Gaps 1 

1 AT GGCT GAGAAAAC CAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GT ACTT CAGGAT GCT 60 
I M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
35 AT GGCT GAGAAAAC CAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GT ACT T CAGGAT GCT 94 

61 TC GGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACC 117 

|| | | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

95 T C GCAGGGC CTC CAGGACAGCTT GTT CT C CT C G GAAAGT GACAACAGT CT GT ACT T CACC 154 

118 TACAGT GGTCAGT CCAACACTCT GGAGGTCAGAGATCTCACCTACCAGGT GGACATCGCC 177 

I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I II I I 
155 TACAGT GGT CAGT CCAACACT CT GGAGGT CAGAGAT CT C AC CT ACC AGGT G GACAT C GC C 214 



Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 17 8 T CTCAGGTGCCTT GGT TTGAGCAGCT GGCT CAGT TCAAGATACCCT GGAGGT CT CAT AGC 237 

I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I II I I I I I I 
Db 215 T CT CAGGT GC CT T GGT TT GAGC AGCT GGCT CAGTT CAAGATAC C CT GGAGGTCT CAT AGC 274 

Qy 238 AGC CAAGACT C CT GT GAGCT GG GCAT C C GAAAT CTAAGCT T CAAAGT GAGGAGT GGACAG 297 

I I I I II I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I II II I I I I I I I I I I I I I I I 
Db 275 AGC CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CTAAGCTTCAAAGT GAGGAGT GGACAG 334 

Qy 298 AT GCT GGCCAT CATAGGGAGCT CAGGCT GCGGGAGAGCCT CACTACT CGACGTGAT CACA 357 

I | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I 
Db 335 AT GCT GGC CAT CATAGGGAGCT CAGGCT GC GGGAGAGC CT CACTACT CGAC GT GAT CACA 394 

Qy 358 GGCAGAGGCCAC GGT GGCAAGAT GAAAT CAGGACAAATTT GGATAAATGGGCAACCCAGT 417 

I | | | | M I I I I I I I I II I I I I I I I I I I I I I II I I II I II M I I I I II I I I I M I I I I I I I 
Db 395 GGCAGAGGC CAC GGT G GCAAGAT GAAAT CAGGACAAAT T T GGAT AAAT GGGCAAC C CAGT 454 

Qy 418 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 477 

I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I II 
Db 455 ACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCC 514 

Qy 478 AACCT GAC C GT CAGAGAGACC CT GGCTT T CAT T GC C CAGAT GCGCCT GC C CAGGAC CTT C 537 

I | I I I I I || I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M II I I I I I I I 
Db 515 AACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTC 574 

Qy 538 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 597 

M I I I I I I I I II I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 575 TCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAG 634 

Qy 598 TGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGA 657 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I II I II I I I I I I I I I I I I I II 
Db 635 T GCGCCAACACCAGAGT GGGCAACACGT AT GT ACGT GGGGT GTCCGGGGGT GAGCGCCGA 694 

Qy 658 C GAGT GAGCAT T GGGGT GCAGCTC CT GT GGAACC CAG GAAT C CT C ATT CT GGAT GAAC C C 717 

M | | | | I M I I I I II I I I I I I I II I I I I M I I II II I I II I I I I I I I I I I I I I I I I I I I I 
Db 695 CGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCC 754 



Qy 


718 


ACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCC 


777 




1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 




Db 


7 SS 


ArTTrT(^rCTCGACAGCTTrACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCC 


814 


Qy 


778 


AAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 


837 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 II 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 1 




Db 


pi i; 

OIJ 


AA(^r^r;rAAr AR^rTGGTf^rTrATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTA 


874 


Qy 


838 


TTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 


897 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


07c. 


TTTr^ArrTf^f^TrrTTrTfiATnArATrTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAA 


934 


Qy 


898 


ATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGAC 


957 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


Q "5 R 


ATrrzTr;r Ar^TArTTrArATrrATTnnrrArrCTTGTCCTCGCTATAGCAACCCTGCGGAC 


994 


Qy 


958 


TTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTG 


1017 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 




Db 




TTrTArrTrrArTT^ArrA^rATrnAPAGArGCAGCAAAGAACGGGAGGTGGCCACCGTG 


1054 


Qy 


1018 


GAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTT 


1077 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 n r r 


mm a rrr a r* Ar ,r nr ,r rr , TTrzP x\fzrcr r vrXT r VC*C r V A(ZA AAA Af^TAPAAfif^r'TTTGATGACTTT 


1114 


Qy 


1078 


CTGTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTC 


1137 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 1 1 c 
iiij 


rTrTrrAA ArtrTr^Arir^rAAA^f^AArTrAArArAAGCACCCACACAGTCAGCCTGACCCTC 


1174 


Qy 


1138 


ACACAGGACACTGACTGTGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 


1197 




1 M I 1 1 1 1 1 II 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 1 

X.J. f D 


Afar ATP AT ArTf^ArTnTf^nnArTf^rTf^TTGAGrTGCCCGGGATGATAGAGCAGTTTTCC 


1234 


Qy 


1198 


ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 


1257 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 




Db 




ArrrTr ATrrTTrrTrAnATTTrrAAT^ArTTrCGGGArCTGCCCACGCTGCTCATTCAT 


1294 


Qy 


1258 


GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 


1317 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 




Db 




rrrTrrr A Ar^rrTf^rTT^ATf^TrrrTr ATT ATTGGCTTrCTTTACTACGGCCATGGGGCC 


1354 


Qy 


1318 


AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 


1377 




1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


JLO J J 


AA(^r Af^PTrTrrTTrATG^ArArAGrAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 


1414 


Qy 


1378 


T T CAAT GT CAT C CT GGAT GT C GT CT CCAAAT GTC ACT CGGAGAGGT CAAT GCT GT ACT AT 


1437 




1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 4 1 S 
11 1 j 


T TCAAT GT CAT C CT GGAT GT CGT CT CCAAAT GT C ACT C GGAGAGGT CAATGCT GT ACT AT 


1474 




11 JO 


(^AnrTafiAAnArGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 


1497 




1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 




Db 


1475 


GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 


1534 


Qy 


1498 


TT GC CGGAGC ACT GT GC CTAC GT CAT CAT CTACG C GAT GC C CAT CT ACT GGCT GACAAAC 


1557 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 




Db 


1535 


T T GC CGGAG C ACT GT GC CT AC GT CAT C AT CT ACGC GAT GC C CAT CTACT GGCTGACAAAC 


1594 



Qy 

Db 



1558 
1595 



1617 
1654 



Qy 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I M 

Db 1655 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1714 

Qy 1678 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1715 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1774 

Qy 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1775 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1834 

Qy 1798 T C GGGGCT GAT GCAGAT T CAATT T AAT GGAC AC CT T TACAC CAC ACAAAT C GGCAACT T C 1857 

I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M M I I I I I 
Db 1835 T CGGTGCT GATGCAGATT CAATTTAAT GGACACCTTTACACCACACAAATCGGCAACTT C 1894 

Qy 1858 AC CTTCT C C AT CCT C GGAGACAC GAT GAT C AGT GC CAT GGAC CT GAACT C GCAT C CACT C 1917 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1895 AC CTT CT CCAT CCT C GGAGACAC GAT GAT CAGT GC CAT GGAC CT GAACT C GCAT C C ACT C 1954 

Qy 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 1955 TAT GC GAT CT ACCT CAT T GT CAT C GGCAT CAGCT ACGGCT T C CT GT T C CT GTACTAT CT A 2014 

Qy 1978 T C CTT GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 2019 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 2015 T C CTT GAAGCTCAT CAAACAGAAGT CAATTCAAGACT GGT GA 2056 



RESULT 3 
AAD48883 

ID AAD48883 standard; DNA; 2669 BP. 
XX 

AC AAD48883; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG8 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hyper cholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 100. .2121 

FT /*tag= a 

FT /product^ "hABCG8 protein" 
XX 

PN WO200281691-A2. 



PD 17-OCT-2002, 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31705. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyper lipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 13; Page 80; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG8 DNA 
XX 

SQ Sequence 2669 BP; 595 A; 768 C; 722 G; 584 T; 0 U; 0 Other; 



Query Match 70.8%; Score 1430; DB 7; Length 2669; 

Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

Qy 1 AT GGCT GAGAAAAC CAAAGAAGAGACCC AGCTGT GGAAT GGGACT GT ACT T CAGGATGCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 100 AT G GC C GGGAAGG C GGCAGAGGAGAGAGGGCT GC C GAAAGGGGC C ACT CCC CAGGAT AC C 159 

Qy 61 T CGGGC CT C CAGGAC AGCT T GTT CT CCT C GGAAAGT GACAACAGT CT GT ACTT CAC CT AC 120 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 160 TCGGGCCTC CAGGAT AGAT T GTT CT CCT CT GAAAGT GACAACAGC CTGT ACTT CAC CTAC 219 

Qy 121 AGT GGTCAGT CCAACACTCT GGAGGTCAGAGATCT CACCT ACCAGGTGGACAT CGCCTCT 180 

I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 
Db 220 AGT GGCCAGCCCAACACCCTGGAGGTCAGAGACCT CAACTACCAGGTGGACCT GGCCT CT 279 

Qy 181 CAGGT GC CT T GGT TT GAGC AGCT GGCT C AGTT CAAGAT AC C CT GGAGGT CT CAT AGCAGC 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IN M 

Db 280 CAGGT CCCTTGGTTTGAGCAGCT GGCT CAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

Qy 241 CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CT AAGCT T CAAAGT GAGGAGT GGACAGAT G 300 

II I II I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 340 CAGAATT CTT GTGAGCT GGGCAT CCAGAACCTAAGCTT CAAAGT GAGAAGTGGGCAGAT G 399 



301 CT GGCCATCATAGGGAGCT CAGGCT GCGGGAGAGCCT CAC TACT CGACGT GAT CACAGGC 360 

| | | | | M | I I I I I I I I I I I I I II II I I I I I I I I I I I I II II I I I I I I I I Ml 
4 00 CTGGCCATCATAGGGAGCT CAGGTT GTGGGAGAGCCTC CTT GCTAGAT GT GATCACT GGC 459 

361 AGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGC7^CCCAGTACG 420 

M I I I I I II I II I I I I I M I I I I I II II I I I I I I I I I I I I I I I I I I M 
4 60 C GAGGT CAC GGC GGCAAGAT CAAGT CAGGC C AGAT CT GGAT CAAT GGGC AGC C CAGCT CG 519 

421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 4 80 

I I I I I I I I I I I I I II I I I I I II M II I I I I I I I I I I I I I I I I I I I I I I I I I 
52 0 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 579 

481 CT GAC C GT CAGAGAGAC C CT GGCT T T CATT GC C CAGAT GC GC CT GC CCAGGACCTT CTC C 540 

| II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
58 0 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

64 0 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

700 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

661 GT GAGCAT T GGGGT GC AGCT C CT GT GGAACC CAGGAAT CCT CATT CT GGAT GAAC C CACT 72 0 

|| | | | I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
7 60 GT CAGCAT T GGGGT GCAGCT CCT GT GGAACC CAGGAATC CTT AT T CT C GAC GAAC CCAC C 819 

721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 7 80 

III! I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II 
820 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 879 

7 81 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

I I I I I I | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I III 

88 0 GGC7VACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

|| | | 1 1 I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml 
940 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

II I I I I I I I I I I I I I I I Ml I I I I I I I I I I I I I I I I I I I I I I I II I I I I II 
1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

961 TACGT GGACT T GAC CAGCAT CGACAGAC GCAGCAAAGAAC GGGAGGT GGC CACCGT GGAG 1020 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

1060 TAT GT GGACCT GAC CAGCAT T GAC AGGCG C AGCAGAGAGCAGGAATT GGC CAC C AGGGAG 1119 

1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

I I I I I MM II I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 
1120 AAGG CT C AGT C ACT C GC AGCC CT GTT T CT AGAAAAAGT GCGT GACT TAGAT GACT TT CTA 1179 

1081 T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCACCCACACAGTCAGCCT GACCCTCACA 1140 

I I I I I I II I I I I I I I I I I I II I I I I I I I I Ml 

1180 T GGAAAGCAGAGAC GAAG GAT CTT GAC GAGGACAC CT GT GT GGAAAGCAGCGT GACC CCA 1239 



Qy 


1141 


Db 


1240 


Qy 


1198 


Db 


1300 


Qy 


1258 


Db 


1360 


Qy 


1318 


Db 


1420 


Qy 


1378 


Db 


1480 


Qy 


1438 


Db 


1540 


Qy 


1498 


Db 


1600 


Qy 


1558 


Db 


1660 


Qy 


1618 


Db 


1720 


Qy 


1678 


Db 


1780 


Qy 


1738 


Db 


1840 


Qy 


1798 


Db 


1900 


Qy 


1858 


Db 


1960 


Qy 


1918 


Db 


2020 


Qy 


1978 



CAGGACACT GACT G T GGGACT GCT GT T GAG CT GCC C GG GAT GAT AGAGC AGT T T TC C 1197 

I | | I | | I I I I I I I I I I I I I I I I I I M I I I I I I I I I 

CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M 
ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 

GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

AAGCAGCT CT CCTTCAT GGACACAGCAGCCCT CCT CTT CAT GATAGGGGCGCT CATT CCT 1377 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I III 
ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

T T CAAT GT CAT C CT GGAT GTC GT CT C CAAAT GT C ACTCGGAGAGGT CAAT GCT GT ACT AT 1437 

I M II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I 

TT CAACGT CAT T CT GGAT GT C AT CT C CAAAT GTT ACT C AGAGAGGGCAAT GCTTT ACT AT 1539 

GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 1497 

|| I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II II II 

GAACT GGAAGAC GGGCT GTAC ACCACT GGTC CAT AT TT CTT T GC CAAGAT CCT CGGGGAG 1599 

T T GC C GGAGC ACT GT GCCT AC GT CAT CAT CT ACGCGAT GCCCAT CTACT GGCT GACAAAC 1557 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I III 
CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Ml MM I I III II I II I II I II II II I II I II II I I I I II II I I II I 
CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

II II II I II I I II I I I II M Ml I III I I I I II I I I II I II I I I I I I I I I I 
TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1779 

TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

II I I I I II I I I I II II I I I II I I I II I I I II I II I I I I I I M I II II II I II M 
TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

II I II I I I I I II II II II I I I I I I I I II II I I I II II II I II I II M 
AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 



T C GGGGCT GAT GC AGAT T CAATTTAAT GGACAC CTTT ACACCAC ACAAAT C GGCAACT T C 

I I II I I I I I I I I I I I I I I I II I I I I I I I II I M I I I 

GAAGGGCTGAT GAAGATT CAGTT CAGCAGAAGAACTTATAAAAT GCCTCT CGGGAACCT C 

AC CT T CT CC AT CCT CGGAGACAC GAT GAT CAGT GCCAT G GACCT GAACT CGCAT CCACT C 
Ml M I II I I I II I II I I M I I I II I I II I I I I I I I I I M I II 
ACCATCGCGGTCT CAGGAGATAAAAT CCT CAGT GCCATGGAGCT GGACTCGTACCCT CTC 



1857 



1959 



1917 



2019 



1977 



TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 

M II II I I I II I M I I M I I I I I I M I I II I I I I M I I II N I I II I 
TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2079 



1978 T C CTT GAAGCT C AT CAAAC AGAAGT CAATT CAAGACT GGT GA 2 019 



Db 2 080 T C CTT AAGGT T CAT CAAAC AGAAAC CAAGT CAAGACT GGT GA 2121 



RESULT 4 
ABK83218 

ID ABK83218 standard; cDNA; 3239 BP. 
XX 

AC ABK83218; 
XX 

DT 27-AUG-2002 (first entry) 
XX 

DE Human transporter and ion channel, TRICH9 , Incyte ID 6585710CB1, cDNA. 
XX 

KW Human; ss; gene; transporter and ion channel; TRICH; transport disorder; 

KW neurological disorder; muscle disorder; immunological disorder; cancer; 

KW scleroderma; systemic lupus erythematosus; allergy; leukaemia; 

KW cell proliferative disorder; cervical cancer; breast cancer; 

KW neurodegenerative disorder; Parkinson's disease; Alzheimer's disease; 

KW myotonic dystrophy; catatonia; endocrine disorder; diabetes; 

KW Grave's disease; gastrointestinal disorder; Crohn's disease; 

KW renal disorder; Good pasture's syndrome; viral infection; cirrhosis; 

KW bacterial infection; fungal infection; parasitic infection; 

KW protozoal infection; helminthic infection; cardiovascular disorder; 

KW atherosclerosis; hepatic disease. 

XX 

OS Homo sapiens. 
XX 

PN WO200240541-A2. 
XX 

PD 23-MAY-2002. 
XX 

PF 25-OCT-2001; 2001WO-US046055 . 
XX 

PR 27-OCT-2000; 2000US-0243989P . 

PR 03-NOV-2000; 2000US-0245904P . 

PR 09-NOV-2000; 2000US-0247673P . 

PR 17-NOV-2000; 2000US-0249661P . 

PR 20-NOV-2000; 2000US-0252232P . 

PR 01-DEC-2000; 2000US-0250790P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Tang YT, Yue H, Nguyen DB, Hafalia AJA, Elliott VS, Lu Y; 

PI Walia NK, Yao MG, Baughn MR, Gandhi AR, Ding L, Sanjanwala M; 

PI Ramkumar J, Arvizu C, Gietzen KJ, Lai PG, Azimzai Y, Khan FA; 

PI Thangavelu K, Thornton M, Lu DAM, Tribouley CM, Warren BA, Ison CH; 

PI Das D, Raumann BE, Policky JL, Kearney L; 

XX 

DR WPI; 2002-463570/49. 

DR P-PSDB; ABG61539. 
XX 

PT New transporters and ion channels (TRICH) polypeptides, useful for 

PT diagnosing, preventing, and treating disorders associated with an 

PT abnormal expression or activity of TRICH, e.g. immunological, muscular or 

PT renal disorders. 

XX 



PS Claim 5; Page 167-168; 178pp; English. 
XX 

CC The invention relates to human transporters and ion channels (TRICH) 

CC polypeptides , a naturally occurring amino acid sequence 90 % identical to 

CC TRICH, a biologically active fragment of TRICH or an immunogenic fragment 

CC of TRICH. Also included are an isolated polynucleotide encoding TRICH, a 

CC recombinant polynucleotide comprising a promoter sequence operably linked 

CC to the TRICH polynucleotide, a cell transformed with the recombinant 

CC polynucleotide, a transgenic organism comprising the recombinant 

CC polynucleotide, an isolated antibody that binds specifically to TRICH, 

CC and screening for compounds which bind to TRICH, modulate TRICH, modulate 

CC TRICH expression or are ant/agonists of TRICH. The polypeptides are 

CC useful for diagnosing, treating, and preventing transport, neurological, 

CC muscle, immunological disorders (e.g. scleroderma, systemic lupus 

CC erythematosus, allergies), cell proliferative disorders such as cancers 

CC (e.g. leukaemia, cervical or breast cancers), neurodegenerative disorders 

CC (e.g. Parkinson's disease, Alzheimer's disease), muscular disorders (e.g. 

CC myotonic dystrophy, catatonia), endocrine disorders (e.g. diabetes, 

CC Grave's disease), gastrointestinal disorders (e.g. Crohn's disease), 

CC renal disorders (e.g. Good pasture's syndrome), viral, bacterial, fungal, 

CC parasitic, protozoal and helminthic infections, cardiovascular disorders 

CC (e.g. atherosclerosis), or hepatic diseases (e.g. cirrhosis) and many 

CC other diseases and disorders detailed in the specification. They can also 

CC be used in assessing the effects of exogenous compounds on the expression 

CC of nucleic acid and amino acid sequences of transporters and ion 

CC channels. TRICH or its fragments may also be used in screening for 

CC compounds that specifically bind to and modulate the activity of TRICH. 

CC The polynucleotides can be used to create knock-in humanised animals or 

CC transgenic animals to model human disease. The present sequence encodes a 

CC TRICH protein 

XX 

SQ Sequence 3239 BP; 784 A; 822 C; 796 G; 837 T; 0 U; 0 Other; 

Query Match 36.8%; Score 743.8; DB 6; Length 3239; 

Best Local Similarity 78.9%; Pred. No. 4.3e-190; 

Matches 899; Conservative 0; Mismatches 237; Indels 3; Gaps 1; 

GGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATA 943 
Ml | | || I I I I I II I I I I I I I I I I I I I I I I I I I I I I M M I I II I I I 



Ml I I I I I II I I I I I I I I I M II I I I I I I I I I I I I I I I I I > I I I 



| I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I I I I 



I | I I I I I I I I I I I I M I I I I I I II I I I I I I I I M I I I I 



IAGC CT GAC CCT CACAC AGGACACT GACT G TGGGACTGCTGTTGAGCTGCCCGGGA 118 0 

Ml I I I I INN I I I I Mill I I I I I I I II 



Qy 


884 


Db 


12 


Qy 


944 


Db 


72 


Qy 


1004 


Db 


132 


Qy 


1064 


Db 


192 


Qy 


1124 


Db 


252 



Qy 1181 TGATAGAGCAGTTTTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGC 1240 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 312 C GGTGCAGCAGTTTAC GAC GCTGAT CCGTCGT CAGATTTCCAAC GACTT CCGAGAC CTGC 371 

Qy 1241 CCACGCTGCTCATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTT 1300 

II I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I II II I I I I 

Db 372 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 431 

Qy 1301 ACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGA 1360 

I I II I I I I I I I II I I II I I I I II I I I I I I I I I I I I I II I I I I I II I I I I 
Db 4 32 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 491 

Qy 1361 TAGGGGCGCTCATTCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGA 1420 

I II II I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 4 92 TCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 551 

Qy 1421 GGTCAATGCTGTACTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTG 1480 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 552 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 611 

Qy 14 81 CCAAGATCCTAGGAGAATTGCCGGAGCACT GT GCCTACGT CAT CATCTACGCGAT GCCCA 154 0 

I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 612 CCAAGATCCT CGGGGAGCTTCCGGAGCACT GT GCCTACAT CAT CATCTACGGGAT GCCCA 671 

Qy 1541 TCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCG 1600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 672 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 731 

Qy 1601 TGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCA 1660 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I III I III I I I I I I I I 
Db 7 32 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 791 

Qy 1661 CCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCG 1720 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 7 92 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 851 

Qy 1721 GCTT CAT GAT AAACT T GGACAAC CT GT GGAT AGT GCCT GCAT GGAT CT CCAAGCT GT C GT 1780 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 852 GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 911 

Qy 1781 TCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCA 1840 

I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I 

Db 912 T CCT GCGGT GGT GT T TT GAAGGGCT GAT GAAGATT CAGT T CAGCAGAAGAACT T AT AAAA 971 

Qy 1841 CACAAAT CGGCAACTTCACCTT CT CCAT CCT CGGAGACACGAT GATCAGTGCCATGGACC 1900 

I II I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I 
Db 972 T GCCT CT C GGGAACCT CACCAT C GC GGT CT CAGGAGAT AAAAT C CT CAGT GCCAT GGAGC 1031 

Qy 1901 T GAACTC GCAT CCACT CTAT GC GAT CT AC CT CATT GT CAT C GGCAT CAGCTACGGCT T C C 1960 

I I Mill I II I I I I I II I I I I I I I I I I I I I I I I III I I I M I I I I I I 

Db 1032 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 1091 

Qy 1961 T GTTCCT GTACTAT CTATCCTT GAAGCT CATCAAACAGAAGT CAATTCAAGACT GGT GA 2019 

II M I I I I I I I I I I I I I I I I I I I I II I I II I I I III I I M I I I I I I I I I 

Db 1092 T GGTCCT GTACTAC GT GT CCT TAAGGT T CAT CAAACAGAAAC CAAGT CAAGACT GGT GA 1150 



RESULT 5 
AAH98911 

ID AAH98911 standard; cDNA; 580 BP. 
XX 

AC AAH98911; 
XX 

DT 12-OCT-2001 (first entry) 
XX 

DE Arabidopsis EST-derived coding sequence SEQ ID NO: 768. 
XX 

KW Human; sheep; pig; cow; fruit fly; yeast; hamster; macaque; horse; 

KW tomato; monkey; dog; sea urchin; expressed sequence tag; EST; 

KW diagnostics; forensic test; gene mapping; genetic disorder; biodiversity; 

KW gene therapy; nutrition; ss. 

XX 

OS Arabidopsis thaliana. 
XX 

PN WO200154477-A2. 
XX 

PD 02-AUG-2001. 
XX 

PF 25-JAN-2001; 2001WO-US002687 . 
XX 

PR 25-JAN-2000; 2000US-00491404 . 

PR 17-JUL-2000; 2000US-00617746 . 

PR 03-AUG-2000; 2000US-00631451 . 

PR 15-SEP-2000; 2000US-00663870 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang YT, Liu C, Zhou P, Qian XB, Wang Z, Chen R, Asundi V; 

PI Cao Y, Drmanac RA, Zhang J, Werhman T; 

XX 

DR WPI; 2001-476164/51. 

DR P-PSDB; AAM24252. 
XX 

PT Isolated polypeptide for treatment of diseases, diagnostics, raising 

PT antibodies and research use. 

XX 

PS Claim 1; Page 664; 1275pp; English. 
XX 

CC The present invention provides the protein and coding sequences of novel 

CC proteins from a variety of organisms, including human, dog, cat, horse, 

CC cow, pig, hamster, monkey, macaque, yeast, bacteria, fruit fly, sea 

CC urchin and tomato. These were derived from expressed sequence tags (ESTs) 

CC from the organism of interest. They can be used in diagnostics, 

CC forensics, gene mapping, identification of mutations, to assess 

CC biodiversity and for nutritional purposes. The present sequence is a cDNA 

CC of the invention 

XX 

SQ Sequence 580 BP; 146 A; 154 C; 116 G; 164 T; 0 U; 0 Other; 

Query Match 11.4%; Score 229.2; DB 4; Length 580; 
Best Local Similarity 84.3%; Pred. No. 2.2e-51; 

Matches 258; Conservative 0; Mismatches 48; Indels 0; Gaps 0 



1407 ATGTCACTCGGAGAGGTCAATGCTGTACTATGAGCTGGAAGACGGGCTGTACACTGCTGG 14 66 



I M I I II I I I I I I I I I I I II I I I M I I I I II I I I I I I I MM 

Db 275 AGGTT ACT C AGAGAGGGCAAT G CT T TACT AT GAACT GGAAGAC GGG CT GT ACAC CACT GG 334 

Qy 1467 T C CTT AT TT CTTT GC CAAGATC CTAGGAGAATT GC C GGAGCACT GT G C CT ACGT CAT CAT 1526 

Ml I I I I I II M II I II I M M M M I I II M M 

Db 335 T C CAT ATT T CTTT GCCAAGAT C CT C GGCGAGCT T CC G GAGCACT GT GC CT ACAT CAT CAT 394 

Qy 1527 CTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCT 1586 

| | || | | I || I || I I I I II I I II I I II I II I I M I I I II I I II I I I I I 
Db 395 CTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCT 454 

Qy 1587 ACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTC 1646 

| | | I I I I M I I I I I I I III II II I II II MM M I I I I I I I II I III I 
Db 455 GCACTTCCTGCTGGAGTGGCTGGCGGTCTTCTGTTGCAAGATTATGGTCCTGGCCGCCGC 514 

Qy 1647 TGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTT 1706 

| | MM I I II I I I I M I I I I I I I I I I I I I I I II II II I II M I I I I I 

Db 515 GGGCCTGCTCCCCACCTTACACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTGCTT 574 

Qy 17 07 CTACCT 1712 

I I I M I 

Db 575 CTACCT 580 



RESULT 6 
ABK51681 

ID ABK51681 standard; DNA; 1920 BP. 
XX 

AC ABK51681; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding human ABCG5 protein. 



XX 
KW 



Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 
KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 
KW chromosome 2p21; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 1. .1920 

FT /*tag= a 

FT /product= "Human ABCG5 protein" 

FT /transl_except= (pos: 4. .9, aa: GDLSSLTPGGSMGL) 

FT /note= "This sequence contains 13 exons" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 



PA (DEAN/) DEAN M. 
XX 

PI Patel ,SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU98984. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 38; Page 36-37; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or .Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the human ABCG5 gene located on chromosome 2p21. 

CC This sequence encodes the human ABCG5 protein of the invention 

XX 

SQ Sequence 1920 BP; 440 A; 503 C; 486 G; 491 T; 0 U; 0 Other; 

Query Match 9.9%; Score 199.2; DB 6; Length 1920; 

Best Local Similarity 54.0%; Pred. No. 5e-43; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1; 

GCAGC CAAGACT C CT GT GAGCT GGGC AT CC GAAAT CTAAGCTT CAAAGT GAGGAGT GG 293 
|| I || I I M I I II I II I I M I M 



I I I I I II I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I M 
I CAGAT CAT GT GC AT C CTAGGAAGCT C AGGCT C C GGGAAAAC CAC GCT GCT GGAC GCCAT 

lACAGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACC 

I 1 1 1 1 II II 1 1 I I I I 1 1 1 1 I I 

ITCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 

;agtacgcctcagctggtgaggaagtgcgttgcgcatgtgcggcagcatgaccaactgct 

I 1 1 1 I I I I 1 1 1 I 1 1 1 1 II I 



Qy 


234 


Db 


141 


Qy 


294 


Db 


201 


Qy 


354 


Db 


261 


Qy 


414 


Db 


321 



Qy 474 GC CCAAC CT GAC CGT CAGAGAGAC CCTGGCTTT CAT T G C C CAGAT G C GC CT GC CC AGGAC 533 

I | | M I I I I I I I I I II I I I I Ml II I I II I I I I I 

Db 381 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 440 

Qy 534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

| || || I III I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 441 CAATCCCGGCTCCTTCC AGAAGAAGGT GGAG GC CGT CAT GGCAGAGCT GAGT CT GAG 4 97 

Qy 594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

|| II II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 98 C CAT GTGGC AGACC GACT GAT T GGCAACT AC AGCT T GGGGGGCAT TT C C AC GGGT GAGC G 557 

Qy 654 C CGACGAGT GAGCATT GGGGT GC AGCT C CTGT GGAAC C CAGGAAT C CT CAT T CT GGAT GA 713 

| | I I I I | | I I I I I I I I I I I I I I Ml I I I I I I I 

Db 558 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 617 

Qy 714 ACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCT 773 

I I I I I I I I I I I I I I I I I I I I I I IN I I I II 

Db 618 GCCAAC CACAGGC CT G GACT GCAT GACT GCTAAT CAGAT T GT CGT C CT CCT GGT GGAAC T 677 

Qy 774 GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 833 

Ml | I M I I I I I I I I I I I I I I I I I I I I I I I I I I IN 

Db 678 GGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCA 737 

Qy 834 GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 8 93 

III III I I I I M I I I I I M Ml 

Db 738 GCT CTT T GACAAAATT G C CAT CCT GAGCTT C GGAGAGCTGAT TT T CT GT GGCAC GC CAGC 7 97 

Qy 8 94 GCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGC 953 

I I I I II I I I I M I I I M N I I II I I 

Db 798 GGAAAT GCTT GATTT CT T CAAT GACT GC GGT T AC C CT T GT C CT GAAC AT T CAAACCCTTT 857 

Qy 954 GGACT T CTACGT GGACT T GACCAGCAT CGACAGAC GC AGCAAAGAACGGGAGGT GGCCAC 1013 

I I I I I I I I I I I I I M I I II I I I I I I II I I I M M 

Db 858 T GACTT CTAT AT GGAC CTGACGT CAGT GGATAC CCAAAGCAAG GAAC GGGAAAT AGAAAC 917 

Qy 1014 CGT GGAGAAGGCACAG 1029 

I III I III 

Db 918 C T C C AAGAG AGT C C AG 933 



RESULT 7 
AAD22009 

ID AAD22009 standard; DNA; 2340 BP. 
XX 

AC AAD22009; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) . 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21; ds . 

XX 

OS Homo sapiens. 



XX 

FH Key Location/Qualifiers 

FT CDS 107. .2062 

FT /*tag= a 

FT /product^ "Human SSG protein 11 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758. 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; AAE13290. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG DNA. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 9.9%; Score 199.2; DB 6; Length 2340; 
Best Local Similarity 54.0%; Pred. No. 5.4e-43; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1 



234 TAG CAGC CAAGACT C CT GT GAGCT GGGC AT C C GAAAT CT AAGCTT CAAAGT GAGGAGT GG 293 

| I I I I I II III III I I I I III I I I I I I I I I 

283 TTGCCGGCAGCAGT GGACCAGGCAGAT CCTCAAAGATGTCT CCTT GTACGT GGAGAGCGG 342 

294 ACAGATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGAT 353 



1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 I 1 1 i 1 1 1 1 I M I ii 

Db 343 GC AGAT CAT GTGC AT C CTAGGAAG CT C AGGCT C C GGGAAAACCAC GCT GCT G GAC GC CAT 4 02 

Qy 354 CACAGGCAGAGGCCACGGTGGCAAGAT GAAAT CAGGACAAATTT GGATAAAT GGGCAACC 413 

I I I I I II II I I I I I I I I I I I I 

Db 403 GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 462 

Qy 414 CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 473 

I I I I I I I I I I I I I I II I I I I I III I I I I I 

Db 4 63 GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 522 

Qy 474 GCC CAAC CT GAC C GT C AGAGAGAC C CT GGCT T T CAT T GC CCAGAT GC GCCT GC C CAGGAC 533 

I I I I I I I I I II I I I I I I I I I Ml II I I I I I I I I I 

Db 523 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 582 

Qy 534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

I || || I III I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 583 CAATCCCGGCTCCTTCC AGAAGAAGGT GGAGGCC GT CAT GG CAGAGCT GAGT CT GAG 639 

Qy 594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

II || II I I I I I I I I III MM I I I I I I I I I 

Db 640 CCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCG 699 

Qy 654 CC GAC GAGT GAGCAT T GGGGTG CAGCT C CT GT GGAAC CCAGGAATC CT C ATT CT GGAT GA 713 

I I I I II I I I I I I I I I I I I I I I I M I I I I M I I 

Db 700 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 759 

Qy 714 ACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCT 773 

I I I I I I I I I I I I I I I I I I I I I I Ml IN M 

Db 7 60 GCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACT 819 

Qy 774 GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 833 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 820 GGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCA 879 

Qy 834 GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 893 

I I I I I I I I I I II II Ml 

Db 8 80 GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 939 

Qy 8 94 GCAAAT G GT GCAGT ACTT CACAT C CATT GGCCACC CT T GT C CT CGCT AT AGCAAC C CT GC 953 

I I I I I I I I I I I I I I I M I I I I I I I I I M II I I II I I 

Db 940 GGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTT 999 

Qy 954 GGACTTCT AC GT GGACT T GAC C AGCAT C GACAGAC GCAGCAAAGAAC GGGAGGT GGCCAC 1013 

I I M I I I I MINIMI I I I I I I I I I I I I I II II 

Db 1000 TGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAAC 1059 

Qy 1014 CGTGGAGAAGGCACAG 1029 

I III I IN 

Db 10 60 CT C CAAGAGAGT C C AG 1075 



RESULT 8 
AAD48882 

ID AAD48882 standard; DNA; 2340 BP. 
XX 

AC AAD48882; 



XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8 ; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 107. .2062 

FT /*tag= a 

FT /product^ "hABCGS protein" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 2 0-NOV-2 001; 2 001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31704. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 77; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 DNA 
XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 



Query Match 9.9%; Score 199.2; DB 7; 

Best Local Similarity 54.0%; Pred. No. 5.4e-43; 
Matches 430; Conservative 0; Mismatches 363; 



Length 2340; 

Indels 3; Gaps 1; 



234 T AGCAGC CAAGACT CCT GTGAGCT GGG CAT C C GAAAT CT AAGCTT CAAAGT GAG GAGT GG 293 

I I I I I I II III III I I I I Ml I I I I I I I I I 

283 T T GCC G G CAGCAGT GGAC CAGGC AGAT C CT CAAAGAT GT CT C CTT GT AC GT GGAGAGC GG 342 

294 ACAGAT GCTGGCCATCATAGGGAGCTCAGGCT GCGGGAGAGCCTCACTACT CGACGT GAT 353 

I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M 

343 GCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCAT 402 

354 C ACAGGCAGAGG C CAC GGT GGCAAGAT GAAAT CAGGACAAAT T TGGATAAATGGGCAACC 413 

I I I I I II II I I I I I I I I I I I I 

403 GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 462 

414 CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 473 

I I II I I I I I I I I I I I I I I I I I III I I I I I 

463 GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 522 

474 GCC CAAC CT GAC C GT C AGAGAGAC C CT GGCTT TC ATT GC C CAGAT GCGC CT GCC CAGGAC 533 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 
523 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 582 

534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

I || II I III II I I I I I I I I I I I I I I II I I I I I I I 
58 3 CAATCCCGGCTCCTTCC AGAAGAAGGT GGAGGC CGT CAT GGC AGAGCT GAGTCT GAG 639 

594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

II || II I I I I I I I I III I I I I I I I I I I I I I 

64 0 C CAT GT GGCAGAC C GACT GAT T GG CAACT AC AGCT T GGGGGGCAT T T C CACGGGT GAGC G 699 

654 C C GAC GAGT GAGC ATT GGGGT GC AGCT C CT GT GGAAC C CAGGAAT CCT CATT CT GGAT GA 713 

I I I I I I I I I I I I I I I I I I I I I I Ml I I I I I I I 

700 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 759 

714 AC CCACTT CT GGC CT CGACAGCT T CACAGC C CAC AAT CT GGT GACAAC CT T GT C CC GCCT 773 
I I I I I I I I I I I It I I I I I M I I III Ml II 

7 60 GC CAAC CACAGGCCTGGACTGCAT GACT GCTAATCAGATTGT CGT CCT CCT GGT GGAACT 819 

774 GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 833 

III I II I I I I I I I I I I I I I I I I I II I I I I I I I I Ml 

820 GGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCA 879 

834 GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 893 

I I I I I I I I I I I III I M I I I I I M I I I 

880 GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 939 

8 94 GCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGC 953 

I I I I I I I I I I I I I I I M I I II I I I I I I I II I I I I I I 

94 0 GGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTT 999 

954 GGACT T CT ACGT GGACT T GACC AGCAT C GACAGAC GCAGCAAAGAACGGGAGGTGGC CAC 1013 

I I I I I I I I I I I I I II I I I I I I I II I I I I I I II II 

1000 T GACTT CTATAT GGACCT GACGT CAGT GGATACCCAAAGCAAGGAACGGGAAATAGAAAC 1059 

1014 C GT GGAGAAGGCACAG 102 9 

I III I III 

1060 CT C CAAGAGAGT C CAG 1075 



RESULT 9 
ABK51682 

ID ABK51682 standard; cDNA; 2516 BP. 
XX 

AC ABK51682; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 cDNA sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia ; Alzheimer's disease; 

KW chromosome 2p21; ss. 
XX 

OS Homo sapiens. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 37-38; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 



CC acid sequence represents the cDNA sequence of human ABCG5 gene located on 

CC chromosome 2p21 

XX 

SQ Sequence 2516 BP; 601 A; 631 C; 636 G; 648 T; 0 U; 0 Other; 

Query Match 9.9%; Score 199.2; DB 6; Length 2516; 

Best Local Similarity 54.0%; Pred. No. 5.6e-43; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1; 

Qy 234 TAGC AGC CAAGACT C CT GT GAGCT GGGC AT C C GAAAT CTAAGCTT CAAAGT GAGGAGT GG 293 

I I I I I I II III III I I I I III I I I I I I I I I 

Db 317 TTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGG 376 

Qy 2 94 ACAGAT GCT GGC CAT CAT AGGGAGCT CAGGCT GC GGGAGAGC CT CACT ACT CGAC GT GAT 353 

Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 377 GCAGAT CAT GT GCAT CCTAGGAAGCT CAGGCT C C GGGAAAAC CAC GCT GCT GGAC GC C AT 436 

Qy 354 CAC AGGCAGAGGCCAC GGT GGCAAGAT GAAAT CAGGACAAATTT GGATAAAT GGGCAAC C 413 

I I I I I II II I I I I I I I I I I I I 

Db 437 GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 496 

Qy 414 CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 473 

I Mill I I I II I I I I I I I I I I I I I I I I I I 
Db 4 97 GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 556 

Qy 474 GCC CAAC CT GAC C GT CAGAGAGAC C CT GGCTTT C AT T GC CCAGAT GCGC CT GCCC AGGAC 533 

I I I I I I I I I I I I II I I I I I I III I I I I I I I I I I I 
Db 557 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 616 

Qy 534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

I II II I III I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 617 CAATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGT CAT GGCAGAGCTGAGTCTGAG 673 

Qy 594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

II II II I I I I I I I I III I III I I I I I I I I I 

Db 674 CCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCG 733 

Qy 654 C CGAC GAGTGAGC ATT GGGGT GC AGCT C CT GT GGAAC C CAGGAAT CCT CATT CT GGAT GA 713 

I I I I I I II I I I I M I I I I I I I I Ml I I I I I I I 

Db 734 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 793 

Qy 714 ACCCACT T CT GGC CT C GACAGCTT CAC AGC C CACAAT CT GGT GACAAC CTT GT C C CGC CT 773 

I I I I I I I I I I I I I I I I I I I I I I III III II 

Db 7 94 GC CAAC CACAGGCCTGGACT GCAT GACTGCTAATCAGATTGTCGT CCT CCT GGT GGAAC T 853 

Qy 774 GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 833 

III I I I I I I I I I I I I I I I I I I I II I I I I I I I I I III 

Db 854 GGCTCGCAGGAACCGAATTGTGGTTCT CAC CATT CACCAGCCCCGTTCTGAGCTTTTTCA 913 

Qy 834 GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 893 

I I I I I I I I I I I III I II I II I I II III 

Db 914 GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 973 

Qy 894 GCAAAT GGT GCAGT ACT T CACAT C CAT T GGC CAC C CT T GT C CT C GCT AT AGCAAC C CT GC 953 

I I I II I I I I I I I I I I II I I I I I I I I I I I II I II II I 

Db 974 GGAAAT GCTT GAT T TCT T CAAT GACTGC GGT TAC C CT T GT C CT GAAC AT T CAAAC CCTT T 1033 



Qy 954 GGACTTCTAC GT GGACTT GACCAGCAT CGACAGACGCAGCAAAGAACGGGAGGTGGCCAC 1013 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II 

Db 1034 T GACT T CTAT AT GGAC CT GAC GT C AGT GGAT ACC CAAAGCAAGGAAC GGGAAAT AGAAAC 1093 

Qy 1014 CGTGGAGAAGGCACAG 1029 

I III I III 

Db 1094 CT C CAAGAGAGT C C AG 1109 



RESULT 10 
ABK51686 

ID ABK51686 standard; cDNA; 2035 BP. 
XX 

AC ABK51686; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding rat ABCG5 protein. 
XX 

KW Rat; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; ss; 
KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

FH Key Location/Qualifiers 
FT CDS 8. .1965 

FT /*tag= a 

FT /product^ "Rat ABCG5 protein" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
DR P-PSDB; AAU96986. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 
PT acid encoding the polypeptide, useful for treating sitosterolemia, 
PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45-46; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 
CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 
CC predisposition for developing sitosterolemia, arteriosclerosis or heart 
CC disease. The molecules of the invention are also useful for identifying a 



CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the rat ABCG5 protein of the invention. (Updated on 

CC 07-AUG-2003 to correct OS field.) 



XX 

SQ Sequence 2035 BP; 481 A; 533 C; 537 G; 484 T; 0 U; 0 Other; 

Query Match 9.7%; Score 195; DB 6; Length 2035; 

Best Local Similarity 53.9%; Pred. No. 7e-42; 

Matches 424; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

Qy 261 CAT C CGAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT GCT GGCCAT CAT AGGGAGCT C 320 

III I II I Ml I I I I I I I I I I I I II I I I I I I I I I I II I 
Db 214 CCT CAAAGATGTCT CCTT GTACATCGAGAGT GGCCAGACCAT GT GCATCTTAGGTAGCT C 273 

Qy 321 AGGCT G C GGGAGAGC CT CACT ACT C GAC GT GAT C ACAGGC AGAGGCC AC GGT GGCAAGAT 380 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M 

Db 274 AGGCTCAGGGAAAACCACGCTGCTGGACGCCATCTCTGGGAGGCTGCGGCGCACAGGGAC 333 

Qy 381 GAAATCAGGACAAAT TT GGAT AAAT GGGCAACC CAGT ACGC CT CAGCT GGT GAGGAAGT G 4 40 

I I I I I I I I I I I I I I I I III 

Db 334 CTTGGAAGGGGAAGTGTTTGTGAACGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 393 

Qy 441 CGT T GC GCAT GTGC GGCAGC AT GACCAACT GCT GC C CAACCT GAC CGT CAGAGAGAC C CT 500 

Ml I I I I I I I I II I I I I I I I I I I I I I I I II I I I I 

Db 394 CGTCTCCTACCTCCTGCAGAGCGATGTCTTTCTGAGCAGCCTCACGGTGCGGGAGACGCT 453 

Qy 501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 

I III II I I I I I I I I I I I I I I III I I I I I 

Db 454 GAGAT ACAC GGC GATGCTGGCTCTCCGCAGCAGCTCCGCGGACTTCTACGACAAGAA 510 

Qy 561 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 620 

I I I I I I M I I I I I I I I I I I I I I I II II I I I I I II 

Db 511 GGT AGAG GCAGT C CT GACAGAGCT GAGT CT GAGC CACGT GGCAGAC CAAAT GAT C GGCAA 570 

Qy 621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 571 CTATAATTTTGGGGGGATTTCCAGTGGCGAGCGGCGCCGAGTGTCCATCGCAGCCCAACT 630 

Qy 681 CCT GT GGAAC CCAGGAAT C CTC AT T CT GGAT GAACC CACT T CT GGC CT C GACAGCTT CAC 740 

Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 631 CCTT C AGGAC CC CAAGGT CAT GAT GCT T GACGAGCCAAC C ACAGGACT GGACT GCAT GAC 690 



Qy 



741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 800 



II I 1 1 I 1 1 1 1 1 1 I 1 1 1 1 1 I III I - I II I I 

Db 691 TGCAAATCATATCGTCCTCCTCTTGGTCGAGCTGGCTCGCAGGAACCGCATTGTAATTGT 750 

Qy 8 01 CT C C CT C CACCAGCCT C GCT CT GACAT CT T C AGGCT ATT T GAC CT GGT C CT T CT GAT GAC 860 

I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I MM 

Db 751 CAC CAT C CACCAG CCT C GCT CT GAGCT CT T C C ACC ACTT C GACAAAATT GC C ATT CT GAC 810 

Qy 861 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCTW^TGGTGCAGTACTTCACATCCAT 92 0 

III III I I I I I I I I II I I I I I I I I I 

Db 811 T T AC GGAGAGT T GGT GT T CT GT GGCAC GC CAGAGGAGAT GCTC GGCTT CTT CAATAACT G 87 0 

Qy 921 T GGC CACC CTT GT CCT C GCT ATAGCAAC C CT GC GGACTT CT AC GT GGACTT GAC C AGC AT 980 

Ml I I II II I I II II I II I I I I I I I I I I I II I II II I I I 

Db 871 T GGT T AC C C CT GT CCT GAACATT C CAAT C CCT T TGATTT CTACAT GGACTT GACAT C GGT 930 

Qy 981 C GACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT GGAGAAGGCACAGTCT CTTGCAGC 104 0 

I I I I I I I II I I I I I I I I I I II II I III I I I I I 

Db 931 G GACACC CAAAGCAGAGAGC GAGAGAT AGAGACGT ACAAGC GAGT CC AGAT GCT GGAAT C 990 

Qy 1041 CCTGTTC 1047 

I I I 

Db 991 TGCCTTC 997 



RESULT 11 
ABK51684 

ID ABK51684 standard; DNA; 1915 BP. 
XX 

AC ABK51684; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW ds. 
XX 

OS Mus sp. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1915 

FT /*tag= a 

FT /partial 

FT /product^ "Mouse ABCG5 protein" 

FT /transl_except= (pos: 1912. .1915, aa: LGIVI FKVRDYLI SR) 

FT /note= "This sequence lacks a stop codon" 

XX 



PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 



PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96985. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitos terolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42-43; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 1915 BP; 453 A; 502 C; 484 G; 476 T; 0 U; 0 Other; 



Query Match 9.2%; Score 186.6; DB 6; Length 1915; 

Best Local Similarity 53.1%; Pred. No. 1.3e-39; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 

Qy 261 CAT CCGAAAT CTAAGCTT CAAAGT GAGGAGT GGACAGATGCT GGCCAT CATAGGGAGCTC 320 

III I I I I III I I I I I I II II I I I II I II I I II I I I I I I 
Db 207 CCT CAAAGAT GT CT C CT T GTACAT CGAGAGT GGC C AGAT TAT GT GCAT CT T AGGCAGCT C 2 66 

Qy 321 AG GCT GC GGGAGAGC CTCACT ACT CGAC GT GAT C ACAGGCAGAG GC C AC GGT GGCAAGAT 380 

I I I I I I I I I II I I I II I I I I I I I I I I I I I I M 

Db 267 AGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 32 6 

Qy 381 GAAAT C AGGACAAAT TTG GATAAAT GGGCAAC CC AGT AC GC CT CAGCT GGT GAGGAAGT G 440 

I I I I I I I I I I I I I I I I III 

Db 327 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 38 6 

Qy 441 CGTT GC GCAT GT GC G GCAGCAT GACCAACT GCT GC C CAAC CT GAC C GT CAGAGAGAC CCT 500 

II I I II I MM III I III II Ml II II I I I I I I I 

Db 387 CTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTT 446 



Qy 501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 

I III II I I I I I III I I I I I III I M I 

Db 447 GC GAT AC ACAGC GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAA 503 

Qy 561 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 620 

I I I I I I II I I I I I I M I I I I I I I i II II I I I I I I 

Db 504 GGT AGAGGCAGT CAT GACAGAGCT GAGC CT GAGC CAC GT GGC G GACCAAAT GATT GGCAG 563 

Qy 621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I Ml III I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 564 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 623 

Qy 681 CCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCAC 740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 624 CCTTCAGGACCCCAAGGT CAT GAT GCT AGAT GAGCCAACCACAGGACTGGACTGCAT GAC 683 

Qy 741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 8 00 

II I I III I I I I I I I I I I I III I I I I I I 

Db 684 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 743 

Qy 8 01 CT C C CT C CAC CAGCCT C GCT CT GACAT CTT CAGGCT AT TT GAC CT GGT CCTT CT GAT GAC 8 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

Db 744 CAC CAT C CAC CAGC CT C GCT CTGAGCT CTT C CAACACT T C GACAAAATT GCCAT CCT GAC 8 03 

Qy 861 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 920 

III III I I I I I I I I I I I I I I I I I I 

Db 804 TTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTG 863 

Qy 921 TGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCAT 980 

III I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I 

Db 864 T GGTT AC CC CT GT C CT GAACATT C CAAT C C CTTT GAT T TT TACAT GGACT T GACAT CAGT 923 

Qy 981 CGACAGACGCAGCAAAGAACGGGAGGT GGCCACCGT GGAGAAGGCACAGT CTCTT GCAGC 1040 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I 

Db 924 GGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGAT GCT GGAATG 983 

Qy 1041 CCTGTTCCTAGAA 1053 

III III 
Db 984 TGCCTTCAAGGAA 996 



RESULT 12 
AAD48880 

ID AAD48880 standard; DNA; 1959 BP. 
XX 

AC AAD48880; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 



OS Mus sp . 
XX 

FH Key Location/Qualifiers 

FT CDS 1. .1591 

FT /*tag= a 

FT /product^ "mABCG5 protein" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31702. 
XX 

PT New ABCG8 polypeptides and nucleic acids , useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 73; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 DNA 
XX 

SQ Sequence 1959 BP; 468 A; 506 C; 495 G; 490 T; 0 U; 0 Other; 

Query Match 9.2%; Score 186.6; DB 7; Length 1959; 
Best Local Similarity 53.1%; Pred. No. 1.3e-39; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 

Qy 261 C AT CCGAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT GCT GGC CAT CAT AGGGAGCT C 32 0 

III I I I I III I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 207 C CT CAAAGAT GT CT CCT T GT AC AT CGAGAGT GG C CAGAT T ATGT GCAT CTT AGG C AGCTC 2 66 

Qy 321 AGGCT GCGGGAGAGCCTCACTACT CGACGTGAT CACAGGCAGAGGCCACGGT GGCAAGAT 380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 267 AGGCT CAGGGAAGACCACGCT GCT GGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 326 

Qy 381 GAAAT CAGGACAAAT T T GGATAAAT GGGCAAC C C AGT AC GCCT CAGCT GGT GAGGAAGT G 440 

I I I I I I I I I I I I I I I I III 

Db 327 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 386 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



441 C GTT GC GC AT GT GC GGC AGCAT GAC CAACT GCT GCC CAAC CT GAC C GT CAGAGAGAC CCT 500 

II I I II I I II I III I I I I I I I I I I I I I I I I I I I I 

387 CTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTT 446 

501 GGCT T T CAT T GCC C AGAT GCGCCTGCC C AGGAC CTT CT CC CAGGCC CAG C GT GACAAAC G 560 

I III II I I I I I III I I I I I III I I I I 

447 GCGATACACAGC GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAA 503 

561 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 620 

I I I I I I till I I I I I I I I I I I I I I II II I I I I II 

504 GGT AGAGGC AGT CAT GACAGAGCT GAGCCT GAGC CACGTGGC GGAC CAAAT GAT T GGCAG 563 

621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I M I III I I I I I I I I I I M II I I I I I I I I I I I I I I 

564 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 623 

681 C CT GT GGAAC C C AGGAAT C CTC ATT CT GGAT GAACC CACTT CT GGC CTC GACAGCT T CAC 740 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

624 CCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGAC 683 

741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 800 

II I I III I I I I I I I I I I I III I I I I I I 

684 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 743 

801 CT C C CT CC AC C AGC CT CGCT CT GACAT CT T CAGGCT AT TT GAC CT GGT C CT T CT GAT GAC 860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MM 

744 CAC CAT CCAC CAGC CT C GCT CT GAGCT CT T C CAACACTTC GACAAAATT GC CAT C CT GAC 803 

861 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 920 

III III I I I I II I II I I I II I I I I 

804 TTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTG 863 

921 T GGC C ACC CT T GT C CTC GCTAT AGCAAC C CT GC GGACT T CT ACGT GGACT T GAC C AGCAT 980 

III I I I I I I I I II II I I I I I II I I I I I I I I I I I I I II I 

864 T GGT T ACC C CT GT CCT GAACAT T C CAAT C CCT T T GAT T TT T ACAT GGACTT GACAT CAGT 923 

981 C GACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT GGAGAAGGCACAGTCT CTTGCAGC 1040 

I II I I II M I II I I II I II II II Mill Mil 

924 GGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGAT GCT GGAATG 983 



1041 CCTGTTCCTAGAA 1053 
III III 
984 TGCCTTCAAGGAA 996 



RESULT 13 
AAD22008 

ID AAD22008 standard; DNA; 2258 BP. 
XX 

AC AAD22008; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) . 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 



KW sterol-related disorder; hyperlipidaemia; hyper cholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; ds . 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 

FT CDS 47. .2005 

FT /*tag= a 

FT /product= "Mouse SSG protein" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; AAE13289. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG DNA. Mouse SSG is located on chromosome 17 

XX 

SQ Sequence 2258 BP; 549 A; 579 C; 567 G; 563 T; 0 U; 0 Other; 

Query Match 9.2%; Score 186.6; DB 6; Length 2258; 
Best Local Similarity 53.1%; Pred. No. 1.4e-39; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 



Qy 261 CAT C C GAAAT CT AAGCTT CAAAGTGAGGAGT GGACAGAT GCT GGCC AT CATAGGGAGCTC 320 



253 CCT CAAAGAT GT CT C CT T GTAC AT C GAGAGT GGC CAGATT AT GT GCAT CTTAGGCAGCT C 312 

321 AGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGCAGAGGCCACGGTGGCAAGAT 380 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II 

313 AGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 372 

381 GAAAT CAGGACAAAT T T GGAT AAAT GGGCAACC C AGT AC GC CT CAGCT GGT GAGGAAGT G 4 40 

I I I I I I I I I I I I I I I I III 

373 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 432 

441 CGT T GC GC AT GT GCGGC AGCAT GACCAACT GCT GC C CAAC CT GACC GT C AGAGAGACC CT 500 

II I I I I I I I I I III I I I I I I I I I I I I I I I I II I I 

433 CT T CT C CT AC GT C CT GCAGAGC GACGTT T T T CT GAGCAGC CT C ACT GT GC GC GAGACGT T 4 92 

501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 

I III II Mill III I I I I I III I II I 

4 93 GCGATACACAGC GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAA 549 

561 GGT GGAAGAC GTAAT C GC C GAGCT GCGGCT GCGGCAGT GC GC CAAC AC C AGAGT GGGCAA 620 

I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I 

550 GGT AGAGGCAGT CAT GACAGAGCT GAGC CT GAGCC AC GT GGC GGAC CAAAT GATT GGC AG 609 

621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I III III I I I I I I I I II I I I I I I I I I I I I I I I I I I 

610 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 669 

681 CCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCAC 740 

III I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

670 CCT T CAGGAC C C CAAGGT CAT GAT GCT AGAT GAGC CAAC C ACAGGACT GGACT GCAT GAC 729 

741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 8 00 

II I I III I II I I I I I I I I III I I I I I I 

730 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 7 89 

801 CTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGAC 860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

790 CAC CAT CC AC CAGCCT C GCT CT GAGCT CT T CCAACACTT C GACAAAAT T GCCAT CCT GAC 849 

8 61 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 920 

III III I I I I I I I I I I I I I I I I I I 

850 TT AC G GAGAGT T GGT GT T CT GT GGCAC C C C AGAGGAGAT GCT T GG CT T CT T CAATAACT G 909 

921 TGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCAT 980 

III I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

910 T GGTT ACC CCT GT CCT GAAC AT T C CAAT C C CTTT GAT TT T T ACAT GGACT T GAC AT CAGT 969 

981 C GACAGAC GCAGCAAAGAAC GGGAGGT GGC CAC C GT GGAGAAGGC AC AGTCT CTT GC AGC 1040 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I 

97 0 GGACACCCAAAGCAGAGAGC GGGAAAT AGAAAC GT ACAAGC GAGT AC AGAT GCT GGAAT G 1029 

1041 CCTGTTCCTAGAA 1053 

III III 

1030 T GC CT T CAAG GAA 1042 



ABK51685 

ID ABK51685 standard; cDNA; 2354 BP. 
XX 

AC ABK51685; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 cDNA sequence. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW ss. 
XX 

OS Mus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the cDNA sequence of the mouse ABCG5 gene of the 



CC invention 
XX 

SQ Sequence 2354 BP; 573 A; 604 C; 594 G; 583 T; 0 U; 0 Others- 
Query Match 9.2%; Score 186.6; DB 6; Length 2354; 
Best Local Similarity 53.1%; Pred. No. 1.4e-39; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 

Qy 261 CAT C C GAAAT CT AAGCT T CAAAGT GAGGAGT GGAC AGAT GCT GGC CAT CATAGGGAG CTC 320 

Ml Mil III I I I I M I I I I I I II I I I I I I I I I I I I I 
Db 345 C CT CAAAGAT GT CT C CT T GT ACAT C GAGAGT GGC CAGAT TAT GTGC AT CT TAGGCAGCT C 404 

Qy 321 AGGCT GCGGGAGAGC CT CACT ACT C GAC GT GATC ACAGGC AGAGGC CAC GGT GGCAAGAT 380 

Mill I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 405 AGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 464 

Qy 381 GAAAT CAGGACAAAT TT GGAT AAAT GGGCAAC CC AGT AC GC CT CAGCT G GTGAGGAAGT G 44 0 

I I I I I I I I I I I I I I I I III 

Db 465 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 524 

Qy 441 CGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTGACCGTCAGAGAGACCCT 500 

II I I I I I I I I I III I I I I I I I I I I I I I MINI I 

Db 525 CTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTT 584 

Qy 501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 

I I II I I I I I I I III I I I I I I I I I I I I 

Db 585 GC GAT ACACAGC GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAA 641 

Qy 561 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 62 0 

I I I II I I I I I I I I I I I I I I I I I I I II M I I M I I 

Db 642 GGT AGAG GCAGT CAT GACAGAGCT GAGCCT GAGC CAC GT G GC GGACCAAAT GATT GGCAG 701 

Qy 621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I Ml III I I II I I I M II I II I II I I MM I I I II 

Db 702 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 761 

Qy 681 C CT GT GGAACC CAGGAAT C CT CAT T CT GGAT GAACC CACT T CTGGC CT CGACAGCT T CAC 74 0 

Ml II I I I I II I II I II I I II I II II II II I II II II 

Db 762 C CT T CAGGACCC CAAGGT CAT GAT GCT AGAT GAG C CAACC ACAGGACT GGACT GCAT GAC 821 

Qy 741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 800 

II I I III II I I I II II I I I M II M I I 

Db 822 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 881 

Qy 801 CT C C CT C CAC CAGC CT C GCT CT GACAT CTT CAGGCT AT TT GAC CT GGT C CTT CTGAT GAC 860 

I I M I I I II I I II II II I I II I Mill I Mill I I II II 

Db 882 CAC CAT C CAC CAGC CT C GCT CT GAGCT CT T C CAACACTTC GACAAAAT T GC C AT C CT GAC 941 

Qy 8 61 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 92 0 

III II I II I I II II M I II II I I I 

Db 942 TTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTG 1001 

Qy 921 T G GC CAC CCT T GT C CT C G CTAT AGCAAC C CTGC GGACT T CTAC GT GGACTT GAC CAGCAT 98 0 

II I I II I I II II I II III M II I II II M I I II I I II I 

Db 1002 T GGT TAC C CCT GT C CT GAACATT C CAAT C C CT T T GAT TTT TACAT GGACT T GACAT CAGT 1061 



Qy 981 C GACAGAC GC AGCAAAGAACGGGAGGT GGCCAC C GT GGAGAAGGCACAGT CT CTT GCAGC 1040 



1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 II II II I 1 1 1 1 1 1 I I 

Db 1062 GGACAC CCAAAGCAGAGAGC GGGAAAT AGAAACGT ACAAGCGAGT ACAGAT GCT GGAAT G 1121 

Qy 1041 CCTGTTCCTAGAA 1053 

I I I I I I 
Db 1122 TGCCTTCAAGGAA 1134 



RESULT 15 
ABK51687 

ID ABK51687 standard; cDNA; 1069 BP. 
XX 

AC ABK51687; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding hamster ABCG5 protein. 
XX 

KW Hamster; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer ! s disease; 

KW ss . 
XX 

OS Cricetinae. 



XX 

FH Key Location/Qualifiers 

FT CDS 30. .1049 

FT /*tag= a 

FT /partial 

FT /product= "Hamster ABCG5 protein" 

FT /note= "This sequence lacks both a start and stop codon" 

XX 



PN WO200227016-A2 . 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96987. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 47; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 



CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the hamster ABCG5 protein of the invention. 

CC (Updated on 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 1069 BP; 266 A; 282 C; 273 G; 248 T; 0 U; 0 Other; 

Query Match 8.7%; Score 176; DB 6; Length 1069; 

Best Local Similarity 56.5%; Pred. No. 7e-37; 

Matches 348; Conservative 0; Mismatches 265; Indels 3; Gaps 1; 

Qy 437 AGT GC GTTGC GCAT GT GC GGCAGC AT GACCAACT GCT GCC CAAC CT GAC CGT CAGAGAGA 496 

I I I I I I I I I I I I I I I III I I I I II I I I I I I I I I I I I 

Db 118 ACTGCTTCTCCTATGTCCTGCAGAGCGACGTCTTTCTGAGCAGTCTCACGGTGCGAGAGA 177 

Qy 4 97 CCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACA 556 

I I II III II Ml I I I I I I I I M II I II I I I M 

Db 178 CGCTGCGCTACACGGCGATGCTGGCCCTCCGCAGTAGCTCTTCGGACTTCTA TGACA 234 

Qy 557 AACGGGT GGAAGAC GTAAT CGC C GAGCT GC GGCT GC GGCAGT G C GCCAACAC CAGAGTGG 616 

I I I I I I I I I I I I I I I I I I I I I I I I II II I II 

Db 235 AGAAGGTAGAGGCAGTCAT GGAAGAGCTAAGT CTGAGCCACGT GGCAGACCGAATGATTG 294 

Qy 617 GCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGC 676 

I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 95 GCAACTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTCTCCATCGCAGCCC 354 

Qy 677 AGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCT 736 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 355 AACT C ATT C AGGAC C C CAAGAT CAT GAT GTT T GAT GAGC CAAC CAC AGGACT GGACT GC A 414 

Qy 737 T CACAGCC CACAAT CT GGT GACAAC CTT GT C CC GC CT GGC CAAGGGCAAC AGG CT G GT GC 796 

I I I I I I I 1 II I I I I I I I I I I I Ml I I I I 

Db 415 TGACTGCAAATCAAATTGTCATCCTCCTGGCAGAGCTGGCTCGCAGGGACCGCATTGTGA 474 

Qy 797 TCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGA 856 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I Mill I I 

Db 475 T C GT CAC C AT CC AC CAGC CT C GCT CT GAGCT CT T T CAACACT T C GACAAAAT T GC C AT C C 534 

Qy 857 TGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACAT 916 

M II I II Ml M I I I I I I II I I I I I I I I I I I 

Db 535 TGACTTACGGAGAGATGGTGTTCTGTGGCACGCCGGAGGAAATGCTCGACTTCTTCAATA 594 



Qy 917 CCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCA 976 

I III I I I I I I I I I I I II Mllll I I I I I I I I I I I I I I I I I I I 

Db 595 GCT GT GGT T AC C CTT GT C CT GAACAT T CCAAC C C CT TT GACT T CTACAT GGACT T GACAT 654 

Qy 977 GCAT C GAC AGAC GCAGCAAAGAAC GGGAGGT GGC C AC C GT GGAGAAGGC ACAGT CT CTT G 1036 

I I I I I I I I I I I I I I I I II III III I III III 

Db 655 CAGT GGAT AC C CAGAGCAGAGAGC GAGAAAT AGAAACCT ACAAGAGAGT C CAGAT GCT CG 714 

Qy 1037 CAGCCCTGTTCCTAGA 1052 

II III III 

Db 715 AATCTGCCTTCAGAGA 730 



Search completed: February 26, 2004 , 01 : 19 : 48 
Job time : 517.357 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 



February 26, 2004, 00:48:03 ; Search time 97.675 Seconds 

(without alignments) 
11471.161 Million cell updates/sec 

US-09-989-981A-3 
2019 

1 atggctgagaaaaccaaaga agtcaattcaagactggtga 2019 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 



682709 seqs, 277475446 residues 



1365418 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB. seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB. seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB . seq: * 

4 : /cgn2_6/ptodata/2/ina/6B_COMB . seq: * 

5 : /cgn2_6/ptodata/2/ina/PCTUS_COMB. seq: * 

6 : /cgn2_6/ptodata/2/ina/backf ilesl . seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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No. Score Match Length DB 
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1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 



132.4 
63.4 
62.2 
59.4 

55 
52.8 
52.8 

51 
49.8 
49.8 
49.6 



6.6 
3.1 



2418 
7218 
4159 
3376 
1977 
4403765 
4411529 
2031 
630 
960 
627 



US-09-245 
US-08-232- 
US-09-614- 
US-09-620- 
US-09-614- 
US-09-103 
US-09-103 
US-09-614- 
US-09-489- 
US-09-489- 
US-09-252- 



808-2 

463-14 

912-139 

312D-918 

912-143 

-840A-2 

-840A-1 

912-137 

039A-932 

039A-945 

991A-12021 



Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 
Sequence 



2, Appli 
14, Appl 
139, App 
918, App 
143, App 
2, Appl 
1, Appl 
137, App 
932, App 
945, App 
12021, A 





1 o 

iz 


A Q £ 

4 y . o 


9 

Z . 


O 


7 99 
/ oZ 


A 
ft 


TTq_ 
U b 




_ 9 S ? - 


QQI 7\_1 1 Qfi9 


kJ \-4 \A. \^ 1 1 ^ *w 


11963, A 




"1 9 




z . 


0 


Z oZ 0 


/I 
fi 


TTQ_ 
U b 


nq. 


_ 9 C 9 _ 
Z OZ 


qqi n_i i pqo 


Q o /-yi lpnrp 

kJ ^3 Vw^ 1 1 S — . V^.. 


11890 A 


c 


T A 

14 


A Q ;1 

4 y . 4 


z . 


4 


/ Uj 


*i 


U tD 




_OC9_ 
*J 


QQI 7i_1 9DS0 


S p nil p n c e 


12050, A 


c 


10 


/I O /T 

4o,D 


Z . 


4 


Z O 0 U ft 


9 
Z 




nft- 

u o 




O / *i X 


Q i9*i lpnrp 


1 Ann 1 i 


c 




/I O £T 


Z . 


4 


Z a o U4 


9 


U b 


no. 
u y 


_ A Q £ — 

u y o 


Oil 9 — 9 

y ftz z 




9 Annl i 


c 


t 9 
1 / 




9 

Z . 


A 

4 


Z o o U 4 


9 
s3 




f|Q. 

u y 


u y o 


R ^7 — 9 
0 D / Z 




9 Annl n 




lo 


A Q /I 

4 8.4 


9 

Z . 


4 


Q7 £. 
O / O 


4 


U b 


no. 
u y 


ft o y 






i nm An 




19 




9 

z . 


9 
O 


y y o 


A 
4 


TT Q — 

U b 


n Q- 
u y 


_ 9 C. 9 _ 

Z OZ 


QQI S—9Q9A 

y y z yz u 




9 Q o A An 


c 


20 


A f O 

46.8 


z , 


o 
o 


1Z84 


4 


TT CJ 


A Q 

uy 


OCT 

-z oz 


qqi a qAj41 
y y x/\ JU4i 




Qf|41 An 




21 


46.8 


9 

z 


o 


i a i a. 
14 / b 


A 

4 


TT C 

U o — 


uy- 


Z OZ 


QQI TV OpOC. 

y y i a— z o z o 


CI 1 1 i~\ ir\ /—* /—\ 

O c LI t: 1 1 c 






o 9 

22 


4o . 8 


9 
Z 


9 


TOO 

/Z o 


4 


TT C! _ 

U b - 


HQ 

uy 


_ 9 c: 9 _ 
ZOZ 


QQI 2i_l 1 1 

y y ia 1 1 o ft _l 


Q ^ /"f l l s~\ ~r*t 

oequcutc 


1 1 S4 1 A 


c 


23 


45.8 


Z 


9 


1 1 EC 

lloo 


4 


TTC 


n q 
u y 


ZOZ 


QQI T\ 1 1 Q/t: 




1 1 ft 4 9 A 




24 


A c. O 

4o . 8 


z 


9 


9 9 C 9 
ZOD / 


A 

4 


TT C _ 

Uo 


n q 
u y 


-9^9 — 
ZOZ 


qqi 7\_i 1 cnn 
y y x/\ iiouu 




1 1 finn a 

J. X U \J \J f T\ 


c 


25 


44.6 


2 


2 


/IOC 

4 o j 


4 




A Q 

u y 


-ZOZ - 


QQ1 7\ QQCQ 

y y i a— y y d y 


sequence 


QQf^Q An 




26 


44.6 


2 


Z 


n a a 
y UU 


4 


U b- 


A Q 

uy 


ZOZ 


qqi a i ni po 

y y i a luio o 


CI /^ri i q 


i m fl^ a 

X U X O O , 


c 


27 


44.6 


2 


2 


19 9 9 

13oZ 


4 


U b~ 


A Q 

uy 


-zoz- 


QQI j\ QPPQ 

y y i a— y o o y 


CI a. /** i 

becjuence 


Q ft fl Q Ai9 




28 


44.4 


Z 


Z 


9 9 C 9 

Z / oZ 


1 


TT C 

U b - 


Uo 


/I 9 A — 
4 o U 


Q 9 R 7\ _ 9 

y z oa o 




3 , Appli 




29 


44 


2 


Z 


O 9 /I 9 

o J4o 


A 
4 


TT C! 

U b 


A Q 

u y 


Q7 £ 

y / o 


C.QA A9 

oyft OUZ 




09 Ann 




30 


43.6 


z 


z 


i a c a 
lUoU 


4 


TT O 

U b- 


A Q 

u y 


- 4 o y — 


u o y A jUOD 




^ n ft An 


c 


31 


43 


2 


-1 

1 


9 9 tZ 

A lb 


A 

4 


US- 


A Q 

u y 


~Z OZ 


QQI 7\ /l C\A Q 

y y i a 4 u 4 y 


CI ^ /yi i /™i /^i t~\ 


4 Azl Q Ar>» 




32 


A 9 

43 


z 


1 


All 

41 / 


4 


TT C! 

Ub- 


A Q 

u y 


_ 9 C 9 _ 

-ZOZ 


QQI a— 9Q9 £ 

y y l a o y z o 


Oti tJU.tr I lot: 


Q Q 9 (T An 




33 


43 


2 


1 


ez r\ £Z 

bUb 


4 


TTO 

U b- 


uy 


9^9 
ZOZ 


QQ1 B-QPQP 

y y i a o o y o 




O O _? O f /A^J 


c 


34 


43 


2 


1 


/bo 


4 


US- 


A Q 

uy 


-zoz- 


qqi 7\ 9QQA 

y y x a— o y o u 


CI Q /Tl 1 ^ Y"l 

b e cju. e n o e 


^Qftn Ai9 




35 


43 


2 


. 1 


104 / 


4 


us- 


A Q 

Uo 


-04 U - 


bOUri— D 


oequence 


O/ Appil 




36 


43 


2 


. 1 


1 A C 9 


4 


TTO 

U b - 


A Q 

u y 


- u i o 


4o414Zo 




149? An 




37 


43 


2 


. 1 


1 O O 9 

lo oZ 


4 


ITC 

Ub~ 


A Q 

Uo 


- 0 4 U ~ 


oOUri 11 


CI £^ iT 1 1 r-\ »^ 


1 1 Annl 
XX r y-vppx 


c 


38 


43 


2 


. 1 


9 9 9 O 

2223 


4 


us- 


A Q 

uy 


-ZOZ 


QQI TV /I HI C. 

y y i a— 4 u i o 




4 (11 ^ An 

*± U X O , /A|-> 




39 


43 


2 


. 1 


O A O O 


4 


TTO 

Ub~ 


A Q 

Uo 


9 


9 A Q — 1 

oUo 1 


CI 1 1 A 

becjuexicc 


1 , Appl i 




40 


A 9 O 

42.8 


9 
Z 


. 1 


9 9 9 
/ JZ 


4 


TTC 

Ub" 


A Q 

u y 


4 o y 


U o y A D ft O U 


CI ^ m i an r~* o 


fi4Sf| An 

Ul JU ^ xA|^J 


c 


41 


42.4 


2 


.1 


411 


4 


us- 


09 


-252- 


991A-5107 


Sequence 


5107, Ap 




42 


42.4 


2 


.1 


1875 


4 


us- 


09 


-252- 


991A-5054 


Sequence 


5054, Ap 




43 


42.4 


2 


.1 


1962 


4 


us- 


-09 


-252- 


-991A-5020 


Sequence 


5020, Ap 


c 


44 


42.4 


2 


.1 


2295 


4 


us- 


-09 


-252- 


-991A-5162 


Sequence 


5162, Ap 




45 


42.4 


2 


.1 


35081 


2 


us- 


-08 


-752- 


-760A-1 


Sequence 


1, Appli 



ALIGNMENTS 



RESULT 1 
US-09-245-808-2 

; Sequence 2, Application US/09245808 

; Patent No. 6313277 

; GENERAL INFORMATION: 

; APPLICANT: Doyle, L. Austin 

APPLICANT: Abruzzo, Lynne V. 
; APPLICANT: Ross, Douglas D. 

TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 
; TITLE OF INVENTION: encodes it 
; FILE, REFERENCE: Ross UMb conversion 
; CURRENT APPLICATION NUMBER: US/09/245, 808 
; CURRENT FILING DATE: 1999-02-05 
; EARLIER APPLICATION NUMBER: 60/073763 
; EARLIER FILING DATE: 1998-02-05 
; NUMBER OF SEQ ID NOS : 7 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 2 



LENGTH: 2418 
; TYPE : DNA 

; ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-2 

Query Match 6.6%; Score 132.4; DB 4; Length 2418; 

Best Local Similarity 51.9%; Pred. No. le-26; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2; 

Qy 304 GCC AT CAT AGGGAGCT CAGGCT GC GGGAGAGC CT CACT ACT C GAC GT GAT CACAGGCAGA 363 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I 
Db 467 GCCAT C CT GGGAC C CAC AGGTGGAGGCAAAT CT T CGT TAT T AGAT G TCTTAGCTGCA 523 

Qy 364 GGC CAC GGT GGCAAGAT GAAAT C AGGACAAAT T T GGATAAAT GGGCAAC C C AGT AC GCCT 423 

I III I I I I I I I I II I I I I I I I I I III I I I 

Db 524 AGGAAAGAT CCAAGT GGAT TAT CT GGAGAT GT T CT GAT AAAT GGAGCAC C GC GACCT GC C 583 

Qy 424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 483 

II I I I I I I I I I I I I I I I I I I I I M IN 

Db 584 AAT TT C AAAT GT AAT T C AGGT TACGT GGTACAAGATGAT GTTGTGAT GGGCACT CT G 640 

Qy 484 ACC GT CAGAGAGAC CCT GGCTT T CAT T GC CC AGAT GC GCCT GC CCAGGAC CT T CT C C CAG 543 

I I I I I II III M I I I I I II II I I I 

Db 641 ACGGT GAGAGAAAACTT ACAGTT CTCAGCAGCT CTTC GGCTTGCAACAACT ATGACGAAT 700 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I I I I I I I I I I I I I I I I II I I II 

Db 701 CAT GAAAAAAACGAACGGATTAACAGGGTCATTCAAGAGTT AGGT CT GGATAAAGT GGCA 760 

Qy 604 AACAC CAGAGT G GGCAAC ACGT AT GT AC GT GGGGT GT C C GGGGGT GAGCGC CGAC GAGT G 663 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II 

Db 761 GACT C CAAGGTT GGAACT CAGT T TAT CC GT GGT GT GT CT GGAGGAGAAAGAAAAAGGACT 820 

Qy 664 AGCATT GGGGT GCAGCT C CTGT GGAAC C CAGGAAT C CT CAT T CT GGAT GAACC CACTT CT 723 

I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I II 

Db 821 AGT AT AG GAATGGAGCT TAT CACT GAT CCTT C CAT CT TGT T CTT GGAT GAG CCTACAACT 880 

Qy 724 GG C CT CGACAGCTT CAC AGCC CACAAT CT GGT GACAACCT T GTC C C GCCT GGC CAAGGGC 783 

III 1 III I I I I I I I II I I I I I I I I M I 

Db 881 GGCT T AGACT CAAGCAC AGCAAAT GCT GT C CT T TT GCTC CT GAAAAGGAT GT CT AAGC AG 94 0 

Qy 784 AACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGAC 843 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 941 GGAC GAACAAT CAT CT T CT C CATT C AT CAGCCT C GAT AT T C CAT CT T CAAGTT GTT TGAT 1000 

Qy 844 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

M I I I I I I I I I I I I I I I I I I I I I I I II I II 

Db 1001 AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTG 1060 

Qy 904 CAGTACT T CAC AT CCAT T GGC C ACC CTT GT C CTCGCTAT AGCAAC C CT GC GGACT T CT AC 963 

I I I I I III III II III Mill I II I I I II I I I I I I I I 

Db 1061 GGATACTTTGAATCAGCTGGTTATCACTGTGAGGCCTATAATAACCCTGCAGACTTCTTC 112 0 

Qy 964 GT GGACT TGA 973 

I I I I I I I 
Db 1121 TTGGACATCA 1130 



RESULT 2 

US-08-232-463-14 

; Sequence 14, Application US/08232463 
; Patent No. 5670367 
; GENERAL INFORMATION: 

APPLICANT: DORNER, F. 
APPLICANT: SCHEIFLINGER, F. 
APPLICANT: FALKNER, F. G. 
; TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 
NUMBER OF SEQUENCES : 52 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Foley & Lardner 

; STREET: 1800 Diagonal Road, Suite 500 

; CITY: Alexandria 

; STATE : VA 

COUNTRY: USA 
ZIP: 22313-0299 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/08/232 , 463 

; FILING DATE: 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/ 935 , 313 
FILING DATE: 

APPLICATION NUMBER: EP 91 114 300.6 
; FILING DATE: 26-AUG-1991 

; ATTORNEY/AGENT INFORMATION: 
; NAME: BENT, Stephen A. 

REGISTRATION NUMBER: 29,768 

REFERENCE/ DOCKET NUMBER: 30472/114 IMMU 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703)836-9300 

TELEFAX: (703)683-4109 
; TELEX: 899149 

; INFORMATION FOR SEQ ID NO: 14: 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 7218 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
; IMMEDIATE SOURCE: 

CLONE: pTZgpt-Fls 
US-08-232-463-14 

Query Match 3.1%; Score 63.4; DB 1; Length 7218; 

Best Local Similarity 7.7%; Pred. No. 2.8e-07; 

Matches 34; Conservative 227; Mismatches 178; Indels 0; Gaps 0; 

Qy 1567 GTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGG 162 6 

| | | | | | | :::::::::::::::::::: : : : : ::::::::: 
Db 1056 GAGCTTGCGATYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1115 



Qy 1627 ACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGC 1686 

Db 1116 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1175 



Qy 1687 AATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTG 1746 

Db 117 6 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1235 

Qy 1747 TGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTG 1806 

Db 1236 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1295 

Qy 1807 AT GCAGAT T CAAT T T AATGGACAC CT TT AC ACC ACACAAAT C GGCAACT T CACCT T CT C C 1866 

Db 1296 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1355 

Qy 1867 AT CCT CGGAGACACGAT GAT CAGT GCCAT GGACCT GAACTCGCATCCACTCTAT GCGAT C 1926 

Db 1356 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1415 

Qy 1927 TACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCCTTGAAG 1986 

::::::::::: : III I I I I III I I I I I II 

Db 1416 YYYYYYYYYYYYYYYYYYYGTACCAAATTCTTCTATCTCTTTAACTACTTGCATAGATAG 1475 

Qy 1987 C T CAT CAAAC AGAAGT CAA 2 005 

I I I I I I I I I I 

Db 1476 GTAATT ACAGTGAT GC CT A 1494 



RESULT 3 

US-09-614-912-139 

Sequence 139, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/ 09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 



; PRIOR APPLICATION NUMBER: 60/172,959 

; PRIOR FILING DATE: 1999-12-21 

; PRIOR APPLICATION NUMBER: 60/172,946 

; PRIOR FILING DATE: 1999-12-21 

; NUMBER OF SEQ ID NOS: 2 04 

; SOFTWARE: Microsoft Office 97 

; SEQ ID NO 139 

LENGTH: 4159 

TYPE: DNA 

ORGANISM: Oryza sativa 
US-09-614-912-139 

Query Match 3.1%; Score 62.2; DB 4; Length 4159; 

Best Local Similarity 49.1%; Pred. No. 4.5e-07; 

Matches 194; Conservative 0; Mismatches 198; Indels 3; Gaps 1; 

Qy 583 CTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCC 642 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 396 CT GGGAT T GGAT AT AT GCGC GGACACGAT C GT C GGC GAC C AGAT GC AGAGGGGGAT CT CC 455 

Qy 643 GGGGGT GAG CG C C GACGAGT GAGCATT G GGGT GCAGCT C CT GTG GAACC CAGGAAT C CT C 702 

I I I I I I I I I I II I I I I I I I II I I II I M 

Db 456 GGT GGT C AGAAGAAAC GCGT C ACCACCGGT GAGAT GAT T GT CGGT C CAACAAAGGT T CTA 515 

Qy 703 ATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACC 7 62 

I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I 

Db 516 TTCAT GGAT GAGAT AT CAACT GGATTGGACAGCTCCACCACATT CCAGATTGT CAAATGC 575 

Qy 763 TTGTCCCG CCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGC 819 

I I III I I I I I I I I I I I I I I I I II I I I I I 

Db 576 CTT CAGCAAAT C GT GCACT T GGGCGAG G CAACCAT C CT CAT GT C ACT C CT ACAACCAGCC 635 

Qy 820 T CT GAC AT CTT CAGGCT AT T T GAC CTGGT C CT T CT GAT GACAT CT GGCACCC CTAT CT AC 879 

I I I I I II I I I I I I I I I III I I I I III I I I I 

Db 636 CCT GAGACT TT T GAGCTATT CGAT GACAT TAT C CT ACT GT CAGAAGGCCAGATT GT T TAT 695 

Qy 880 CTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGC 939 

III! I I I I I I I I I I I II II II II I I 

Db 696 CAGGGAC C CC GC GAATACGT C CTT GAGTT CTTT GAGT CAT GCGGATT C C GCT GC C CAGAG 755 

Qy 940 T AT AGCAACC CT GC GGACT T CT AC GT GGACTT GAC 974 

II I I II I I I I I III I I I I 

Db 756 CGTAAGGGTACT GCAGACTTT CTT CAGGAGGTGAC 790 



RESULT 4 

US-09-620-312D-918 

Sequence 918, Application US/09620312D 
Patent No. 6569662 
GENERAL INFORMATION: 
APPLICANT: Tang, Y. Tom 
APPLICANT: Liu, Chenghua 
APPLICANT: Asundi, Vinod 
APPLICANT: Zhang, Jie 
APPLICANT: Ren, Feiyan 
APPLICANT: Chen, Rui-hong 
APPLICANT: Zhao, Qing A. 



APPLICANT: Wehrman, Tom 
APPLICANT: Xue, Aidong J. 
APPLICANT: Yang, Yonghong 
APPLICANT: Wang, Jian-Rui 
APPLICANT: Zhou, Ping 
APPLICANT: Ma, Yunqing 
APPLICANT: Wang, Dunrui 
APPLICANT: Wang, Zhiwei 
APPLICANT: John Tillinghast 
APPLICANT: Drmanac, Radoje T. 

TITLE OF INVENTION: No. 6569662el Nucleic Acids and 
TITLE OF INVENTION: Polypeptides 
FILE REFERENCE: 784CIP2B 

CURRENT APPLICATION NUMBER: US/09/ 620, 312D 
CURRENT FILING DATE: 2000-07-19 
PRIOR APPLICATION NUMBER: 09/552,317 
PRIOR FILING DATE: 2000-04-25 
PRIOR APPLICATION NUMBER: 09/488,725 
PRIOR FILING DATE: 2000-01-21 
NUMBER OF SEQ ID NOS: 1105 
SOFTWARE: pt_FL_genes Version 1.0 
SEQ ID NO 918 
LENGTH: 3376 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (2808) 
US-09-620-312D-918 

Query Match 2.9%; Score 59.4; DB 4; Length 3376; 

Best Local Similarity 48.4%; Pred. No. 2.4e-06; 

Matches 279; Conservative 0; Mismatches 271; Indels 27; Gaps 3; 

Qy 280 AAAGT GAGGAGT GGACAGAT GCT GGCCAT CAT AGGGAGCT C AGGCT GC GGGAGAGC CT C A 339 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III I M 

Db 88 AAATTCTGCCGCCGGGAGCTGATTGGCATCATGGGCCCCTCAGGGGCTGGCAAGTCTACA 147 

Qy 34 0 CT ACT C GAC GT GAT CACAGGCAGAGGC C AC GGT GGCAAGAT GAAAT CAGGACAAAT TT GG 399 

I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 148 TT CATGAACAT CTTGGCAGGATACAGGGAGTCT GGAATGAAG GGGCAGAT CCT G 201 

Qy 400 ATAAATGGGCAACCCAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAG 459 

I I I I I I III II I I I I I I II III 

Db 202 GTTAATGGAAGGCCACGGGAGCTGAGGACCTTCCGCAAGATGTCCTGCTACATCATGCAA 261 

Qy 460 CAT GACCAACT GCT GC C CAAC CT GAC C GT C AGAGAGACC CT GG CT T T CATT GC C CAGAT G 519 

I I I I I I I I I I I I I I I I I I I II II I I I I II III I II 

Db 262 GATGACATGCTGCTGCCGCACCTCACGGTGTTGGAAGCCATGATGGTCTCTGCTAACCTG 321 

Qy 520 CGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCC 579 

II I II I I I I I I I I II I I I I I I I I 

Db 322 AAGCTGAGTGAGA AGC AGGAGGT GAAGAAGGAGCT GGT GAC AGAGAT C CT GAC G 375 

Qy 58 0 GAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTG 639 

I I I I I I I I I I I I I I I I I I I I I II II 

Db 376 GCACT GGGCCT GAT GT CGT GCT CCCACACGAGGACAGCC CTGCTC 420 



Qy 640 TCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATC 699 

I I M I I III II II I I I I I I I II I I I I I I I II 

Db 421 TCTGGCGGGCAGAGGAAGCGTCTGGCCATCGCCCTGGAGCTGGTCAACAACCCGCCTGTC 480 

Qy 700 CTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACA 759 

I I I I I I I I I I I II I I I I I I I I I I II II I I I I I I 

Db 481 ATGTTCTTTGATGAGCCCACCAGTGGTCTGGATAGCGCCTCTTGTTTCCAAGTGGTGTCC 540 

Qy 760 ACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGC 819 

III I II I I I I I I I II I II I I I I I I I I I I I I I I 

Db 541 CT C AT GAAGT CC CT GGCAC AGGGGGGC C GTAC CAT CAT CT GC ACCAT CC AC CAGCC CAGT 600 

Qy 820 TCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGA 856 

I I I I I I I I I I I I I I I II II 

Db 601 GCCAAGCT CTTT GAGAT GTTT GACAAGTGCAT CTT CA 637 



RESULT 5 

US-09-614-912-143 

Sequence 143, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 143 
LENGTH: 1977 
TYPE: DNA 

ORGANISM: Triticum aestivum 
US-09-614-912-143 



Query Match 2.7%; Score 55; DB 4; Length 1977; 

Best Local Similarity 51.4%; Pred. No. 3.1e-05; 

Matches 127; Conservative 0; Mismatches 120; Indels 0; Gaps 



0; 



Qy 613 GTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGG 672 

I I I I II II I I I I I I I I I I I I I I I I I I 

Db 248 GTTGGGCTCCCTGGAGTGAATGGTCTATCAACTGAGCAACGCAAGAGGCTTACAATTGCC 307 

Qy 673 GTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGAC 732 

I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 308 GTGGAGCTTGTTGCTAACCCGTCGATCATTTTTATGGATGAGCCAACATCTGGTCTTGAT 367 

Qy 733 AGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTG 7 92 

III I I I I I I I II I I I II I I I I 

Db 368 G CT C GT GCAGCT GCAAT T GT GAT GAGGACT GT T AGGAACACT GTTAACACT G GC AGGACC 427 

Qy 7 93 GTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTT 852 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 428 GTT GTTT GCACCAT CCACCAGC CAAGTATT GACATATTT GAAGCATTT GATGAGCTTTTC 4 87 

Qy 853 CTGATGA 859 

I I I I I I 

Db 4 88 TTGATGA 4 94 



RESULT 6 

US-09-103-840A-2 

Sequence 2, Application US/09103840A 
Patent No. 6294328 
GENERAL INFORMATION : 
APPLICANT: FLEISCHMAN, Robert D. 
APPLICANT: WHITE, Owen R. 
APPLICANT: FRASER, Claire M. 
APPLICANT: VENTER, John C. 

TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 
TITLE OF INVENTION: TUBERCULOSIS 
FILE REFERENCE: 24366-20007.00 
CURRENT APPLICATION NUMBER: US/09/103, 840A 
CURRENT FILING DATE: 1998-06-24 
NUMBER OF SEQ ID NOS: 2 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 

LENGTH: 4403765 
TYPE : DNA 

ORGANISM: Mycobacterium tuberculosis 
FEATURE : 

CDC 1551 

"n" bases at various positions throughout the sequence 
represent a, t, c or g 



OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
US-09-103-840A-2 



Query Match 2.6%; Score 52.8; DB 3; Length 44037 65; 

Best Local Similarity 49.4%; Pred. No. 0.0067; 

Matches 178; Conservative 0; Mismatches 167; Indels 15; Gaps 1; 

Qy 451 GTGCGGCAGCATGACCAACTGCTGCCCAACCTGACCGTCAGAGAGACCCTGGCTTTCATT 510 

I I I I I I I I I I I III I I I I I II I I I I I I I I I I 



Db 1965645 GT GC CACAGGAC GACGTGGT GCAC GGT C AG CT GACCGT GAAACACGC GCT GAT GTAT GC C 
1965704 



Qy 511 GCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGAC 570 

I I I I I I I I I I I I I I I I I I I III I I III 

Db 1965705 GCCGAACTACGGCTGCCGCCGGACACCACCAAAGATGACCGCACCCAGGTAGTTGCCCGG 
1965764 

Qy 571 GTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTA 630 

II III I I I I III II I I I I I I I I I I I I I I I 

Db 1965765 GT GCT CGAAGAACT CGAGAT GTCCAAGCACAT CGACACCAGGGTCGACAA- ■ 

1965814 

Qy 631 CGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAAC 690 

I I II I I I I I I I I I I I I I I I I I I I I I I II 

Db 1965815 GCTGTCGGGTGGTCAACGCAAGCGGGCGTCGGTGGCGCTTGAGCTGTTGACCGGG 

1965869 

Qy 691 C CAGGAAT CCT CAT T CT GGAT GAAC CCACTT CT GGCCT C GACAGCTT CACAGC C CACAAT 750 

II I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 1965870 CCGTCACTGCTGATCCTCGACGAGCCGACATCCGGCCTAGATCCTGCGCTGGACCGGCAG 
1965929 

Qy 751 CTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCAC 810 

I II I I I II I I I I II I I I I I I I I II I I I I J I I I 

Db 1965930 GTCATGACCATGCTGCGGCAGTTGGCCGACGCCGGTCGGGTGGTGCTCGTGGTTACCCAC 

1965989 



RESULT 7 

US-09-103-840A-1 

; Sequence 1, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: FRASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 
; TITLE OF INVENTION: TUBERCULOSIS 

FILE REFERENCE: 24366-20007.00 
; CURRENT APPLICATION NUMBER: US/09/103, 840A 
; CURRENT FILING DATE: 1998-06-24 
; NUMBER OF SEQ ID NOS : 2 

SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 1 
; LENGTH: 4411529 
TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 

OTHER INFORMATION: H37Rv 
US-09-103-840A-1 

Query Match 2.6%; Score 52.8; DB 3; Length 4411529; 

Best Local Similarity 49.4%; Pred. No. 0.0067; 

Matches 178; Conservative 0; Mismatches 167; Indels 15; Gaps 1; 
Qy 451 GT GC GGCAGCAT GACCAAC T GCT GCC CAAC CT GAC CGT CAGAGAGACC CT GGCTT T CAT T 510 



Db 1974816 GT GC C AC AGGAC GACGT GGT GCAC GGT CAGCT GAC C GT GAAACAC GC GCT GAT GT AT GCC 
1974875 

Qy 511 GCC CAGAT GCGC CT GCC CAGGAC CT T CT C C C AGGC C CAGC GTGACAAACGGGT GGAAGAC 570 

I I I I I I I I I I I I I I I I I I I III II III 

Db 1974876 GCC GAACT ACGGCT GC C GCC GGAC AC C AC CAAAGAT GAC C GCAC C C AGGT AGT T GC C CGG 
1974935 

Qy 571 GTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTA 630 

II III I I I I III II I I I I I I I I I I I I I I I 

Db 1974 936 GT GCT C GAAGAACT C GAGAT GT C CAAGCAC AT C GACAC C AGGGT C GACAA 

1974985 

Qy 631 CGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAAC 690 

I I I II I I I I I I I I I I I I I I I I I I I I I II 

Db 1974986 GCTGTCGGGTGGTCAACGCAAGCGGGCGTCGGTGGCGCTTGAGCTGTTGACCGGG 

1975040 

Qy 691 CCAGGAAT C CT CATT CT GGAT GAAC CC ACTT CT GGC CT C GACAGCTT C ACAGC C CACAAT 750 

II I I I I I I I I I I I I M I I I I I I I I I I I III I 

Db 1975041 CCGTCACTGCTGATCCTCGACGAGCCGACATCCGGCCTAGATCCTGCGCTGGACCGGCAG 

1975100 

Qy 751 CTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCAC 810 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 

Db 1975101 GTCATGACCATGCTGCGGCAGTTGGCCGACGCCGGTCGGGTGGTGCTCGTGGTTACCCAC 
1975160 



RESULT 8 

US-09-614-912-137 

Sequence 137, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION : 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614,912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 



PRIOR FILING DATE 
PRIOR APPLICATION 
PRIOR FILING DATE 
PRIOR APPLICATION 
PRIOR FILING DATE 
PRIOR APPLICATION 
PRIOR FILING DATE 



1999-07-12 
NUMBER: 60/143, 412 

1999-07-12 
NUMBER: 60/146, 650 

1999-07-30 
NUMBER: 60/170, 906 

1999-12-15 



PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 137 
LENGTH: 2031 
TYPE: DNA 
ORGANISM: Zea mays 
US-09-614-912-137 

Query Match 2.5%; Score 51; DB 4; Length 2031; 

Best Local Similarity 47.6%; Pred. No. 0.00041; 

Matches 150; Conservative 0; Mismatches 165; Indels 0; Gaps 0; 

Qy 557 AACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGG 616 

II I I I I I I I I I I I I II III I I II I I I I 

Db 424 AATT T GT GGAT GAAGTT AT GGAACTAGT GGAGCT CGACAAT CT GAGGGAT GC CTT AGTT G 483 

Qy 617 GCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGC 676 

I I I I I I I I I I II I I I I I I I I I I I 

Db 484 GGCTACCAGGAATCACAGGGCTTT CGACAGAGCAAAGAAAAAGGTT GACAAT AGCCGT GG 543 

Qy 677 AG CT C CT GT GGAAC CCAGGAAT C CTCAT T CT GGATGAAC C CACT T CT GGC CT C GAC AGCT 736 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 544 AGCT CGTT GCCAATCCAT CAAT CATATTTATGGATGAACCAACATCAGGGCTT GAT GCAA 603 

Qy 737 TCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGC 796 

III I I I I I I I I I I I I I I II II 

Db 604 GAGCT GCAG CAATT GT CAT GAGAACT GT GC GGAACACAGT T GACACT GGACGGACAGT T G 663 

Qy 797 TCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGA 856 

I II I I I I I I I M II I I I I I I I I I I I I I I I II. II 

Db 664 T T T GCACAAT CCAT CAGC CAAGCATC GAC AT CTT T GAAT CT T T T GAT GAGTT GCT AT T GT 723 

Qy 857 T GACAT CT G GCACC C 871 

I I I I I I I I 
Db 724 T GAAAAG AG G AG GC C 738 



RESULT 9 

US-09-4 8 9-039A-932/C 

Sequence 932, Application US/09489039A 
Patent No. 6610836 
GENERAL INFORMATION: 
APPLICANT: Gary Breton et. al 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.2004001 
CURRENT APPLICATION NUMBER: US/ 09/4 89 , 039A 
CURRENT FILING DATE: 2000-01-27 
PRIOR APPLICATION NUMBER: US 60/117,747 
PRIOR FILING DATE: 1999-01-29 
NUMBER OF SEQ ID NOS: 14342 
SEQ ID NO 932 



LENGTH: 630 
TYPE: DNA 

ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-932 

Query Match 2.5%; Score 49.8; DB 4 ; Length 630; 

Best Local Similarity 46.4%; Pred. No. 0.00048; 

Matches 162; Conservative 0; Mismatches 187; Indels 0; Gaps 0; 

Qy 558 ACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGG 617 

I I I I I III I I I I I I I I I I I I I I I I I I I II III 

Db 476 AAGGATCGCCGACCGGATCGACGAGCTGATGGCGCTGCTGGGGCTGGAGGCGACGCTGCG 417 

Qy 618 CAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCA 677 

III II II I I I I I I I I I III I I I I I I I III I 

Db 416 CGACCGTTACCCGCATCAGCTCTCCGGCGGCCAGCAGCAGCGGGTGGGGGTGGCGCGGGC 357 

Qy 678 GCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTT 737 

III I I I I I III I I I I I I I I I II I I I I I I I I I 

Db 356 GCTGGCGGCAGATCCGGAGGTGCTGTTGATGGATGAGCCCTTCGGCGCCCTCGACCCGGT 2 97 

Qy 738 CACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCT 7 97 

II II I I I I II I I I I I I I I I I 

Db 296 GACCCGCGAGGCGCTGCAGCAGGAGATGCTGCGCATCCACCGTCTGCTGGGACGGACGAT 237 

Qy 798 CAT CT C CCT C C AC CAGC CT C GCT CT GACAT CTT CAGGCT AT TT GAC CT GGT C CTT CT GAT 8 57 

I I III I II I I I I I I I I I I I I I I I 

Db 236 T GT GCT GGT GAC C CAT GAT ATT GAC GAAGC GCT GC GT CT GGC GGAC CACCT GGT GCT GAT 177 

Qy 858 GACAT CTGGCACCCCTATCTACCTGGGGGC GGC GCAGCAAAT GGT GCAG 906 

I III I I I I I I I I I I I I I I I I I I I I I I 

Db 176 GGACGGGGGCGAGGTGGTCCAGCAGGGGGCGCCGCTGGAGATGCTCCTG 128 



RESULT 10 

US-09-489-039A-945 

; Sequence 945, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

FILE REFERENCE: 2709.2004001 
; CURRENT APPLICATION NUMBER: US/09/4 8 9, 039A 
; CURRENT FILING DATE: 2000-01-27 
; PRIOR APPLICATION NUMBER: US 60/117,747 
; PRIOR FILING DATE: 1999-01-29 
; NUMBER OF SEQ ID NOS : 14 342 
; SEQ ID NO 945 

LENGTH: 960 

TYPE: DNA 
; ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-945 



Query Match 

Best Local Similarity 



2.5%; 
46.4%; 



Score 4 9.8; DB 4; Length 960; 
Pred. No. 0.0006; 



Matches 162; Conservative 0; Mismatches 187; Indels 0; Gaps 0; 



Qy 558 ACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGG 617 

I I I I I III I I II I I I I I I I I I I I I I I I II III 

Db 336 AAGGATCGCCGACCGGATCGACGAGCTGATGGCGCTGCTGGGGCTGGAGGCGACGCTGCG 395 

Qy 618 CAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCA 677 

III II II I I I I I I I I I III I I I I II I III I 

Db 396 CGACCGTTACCCGCATCAGCTCTCCGGCGGCCAGCAGCAGCGGGTGGGGGTGGCGCGGGC 4 55 

Qy 678 GCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTT 7 37 

III I. I I I I III I I I I I I I I I I I I I I I I I I I I 

Db 456 GCTGGCGGCAGATCCGGAGGTGCTGTTGATGGATGAGCCCTTCGGCGCCCTCGACCCGGT 515 

Qy 738 CACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCT^ACAGGCTGGTGCT 797 

II II I I I I II I I I I I I I I I I 

Db 516 GACCCGCGAGGCGCTGCAGCAGGAGATGCTGCGCATCCACCGTCTGCTGGGACGGACGAT 575 

Qy 798 CATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGAT 8 57 

I I III I II I I I I I I I I I I I I I I I 

Db 576 TGTGCTGGTGACCCATGATATTGACGAAGCGCTGCGTCTGGCGGACCACCTGGTGCTGAT 635 

Qy 858 GACAT CTGGCACC CCTAT CTACCT GGGGGCGGC GCAGCAAAT GGT GCAG 906 

I III I I I I I I I I I I I I I I I I I I I I I I 

Db 636 GGACGGGGGCGAGGTGGTCCAGCAGGGGGCGCCGCTGGAGATGCTCCTG 684 



RESULT 11 

US-09-252-991A-12021/C 

; Sequence 12021, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252, 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 12021 

LENGTH: 627 

TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-12021 

Query Match 2.5%; Score 49.6; DB 4; Length 627; 

Best Local Similarity 51.3%; Pred. No. 0.00055; 

Matches 115; Conservative 0; Mismatches 109; Indels 0; Gaps 0; 

Qy 591 GCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGA 650 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 583 GCAGCGCTACGGCATGCCGCTGGAGCCTCGCCGGCTGGTCCATGGGCTGTCCATCGGCGA 524 



Qy 651 GCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGA 710 

I I I I I I I I I I I I I I II II III I I I I I I II I I 

Db 523 GCGCCAGCGGGTGGAGATCGTGCGGTGCCTGATGCAGGACATCCGCCTGCTGATCCTCGA 464 

Qy 711 T GAAC C C ACT T CT GGC CT C GAC AGCTT CACAGC C C ACAAT CT GGT GACAAC CT T GT C C CG 770 

I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 463 C GAG C C GACT T C GGT GCT GAC C C C AC GC GAGGC C GAGGAT CT CTT C GT CAC CCTGCGCCG 404 

Qy 771 CCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGC 814 

I I I I II II I I I I I I I I I I I I I I I 

Db 403 TCTTGCGGAAGAGGGCTGCAGTGTCCTCTTCATCAGCCACAAGC 360 



RESULT 12 

US-09-252-991A-11963 

Sequence 11963, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al. 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 11963 
LENGTH: 732 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-11963 

Query Match 2.5%; Score 49.6; DB 4; Length 732; 

Best Local Similarity 51.3%; Pred. No. 0.00059; 

Matches 115; Conservative 0; Mismatches 109; Indels 0; Gaps 0; 

Qy 591 GCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGA 650 

I I I I I II I I I II I I I I I I I I I I I I I I I I I I 

Db 43 GCAGCGCTACGGCATGCCGCTGGAGCCTCGCCGGCTGGTCCATGGGCTGTCCATCGGCGA 102 

Qy 651 GCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGA 710 

I I I I I I I I I I I I I I II II III I I I I I I I I I I 

Db 103 GCGCCAGCGGGTGGAGATCGTGCGCTGCCTGATGCAGGACATCCGCCTGCTGATCCTCGA 162 

Qy 711 TGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCG 77 0 

I I I I I I II I I II I I I I I I I II I I I I I I I III 

Db 163 CGAGCCGACTTCGGTGCTGACCCCACGCGAGGCjCGAGGATCTCTTCGTCACCCTGCGCCG 222 

Qy 771 CCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGC 814 

I I I I II II I I I II I I I I I I I I I I 

Db 223 TCTTGCGGAAGAGGGCTGCAGTGTCCTCTTCATCAGCCACAAGC 266 



RESULT 13 

US-09-252-991A-11890 

Sequence 11890, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/ 09/252, 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 11890 
LENGTH: 2328 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-118 90 

Query Match 2.5%; Score 49.6; DB 4; Length 2328; 

Best Local Similarity 51.3%; Pred. No. 0.0011; 

Matches 115; Conservative 0; Mismatches 109; Indels 0; Gaps 0; 

Qy 591 GCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGA 650 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 45 GCAGCGCTACGGCATGCCGCTGGAGCCTCGCCGGCTGGTCCATGGGCTGTCCATCGGCGA 104 

Qy 651 GCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGA 710 

I I I I I I I I I I I I I I II II III I I I I I I I I I I 

Db 105 GCGCCAGCGGGTGGAGATCGTGCGCTGCCTGATGCAGGACATCCGCCTGCTGATCCTCGA 164 

Qy 711 T GAACC C ACT T CT GGCCT CGACAGCTT CACAGCCCACAAT CT GGT GACAAC CTT GT C CC G 770 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 165 CGAGCCGACTTCGGTGCTGACCCCACGCGAGGCCGAGGATCTCTTCGTCACCCTGCGCCG 224 

Qy 771 CCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGC 814 

M I I II II I I I I I I I I I I I I I I I 

Db 225 TCTTGCGGAAGAGGGCTGCAGTGTCCTCTTCATCAGCCACAAGC 268 



RESULT 14 

US-09-252-991A-12050/C 

; Sequence 12050, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/ 09/252 , 99 1A 
; CURRENT FILING DATE: 1999-02-18 



PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 12050 
LENGTH: 7 05 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-12050 

Query* Match 2.4%; Score 49.4; DB 4; Length 705; 

Best Local Similarity 54.0%; Pred. No. 0.00066; 

Matches 101; Conservative 0; Mismatches 86; Indels 0; Gaps 0; 

Qy 628 GTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGG 687 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II II I 

Db 674 GTCCATGGGCTGTCCATCGGCGAGCGCCAGCGGGTGGAGATCGTGCGCTGCCTGATGCAG 615 

Qy 688 AACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCAC 747 

II I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 614 GACAT C C GCCT GCT GAT CCTC GAC GAGC C GACTT C GGT GCT GACC C CAC GC GAGGC C GAG 555 

Qy 748 AATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTC 8 07 

I I I I I III II I II I I I I II II I I I I I I I I I 

Db 554 GATCTCTTCGTCACCCTGCGCCGTCTTGCGGAAGAGGGCTGCAGTGTCCTCTTCATCAGC 4 95 

Qy 808 CACCAGC 814 

III I II 

Db 494 CACAAGC 488 



RESULT 15 
US-08-592-874-l/c 

Sequence 1, Application US/08592874 
Patent No. 5854034 
GENERAL INFORMATION: 

APPLICANT: POLLOCK, THOMAS J. 
APPLICANT: YAMAZAKI, MOTOHIDE 
APPLICANT: THORNE, LINDA 
APPLICANT: MIKOLAJCZAK, MARC I A 
APPLICANT: ARMENTROUT, RICHARD W. 

TITLE OF INVENTION: DNA SEGMENTS AND METHODS FOR INCREASING 
TITLE OF INVENTION: POLYSACCHARIDE PRODUCTION 
NUMBER OF SEQUENCES: 1 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: JULES E. GOLDBERG 
STREET: 261 MADISON AVENUE 
CITY: NEW YORK 
STATE: NY 
COUNTRY: USA 
ZIP : 10016-2391 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 



CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/592 , 874 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/377,440 
FILING DATE: 24-JAN-1995 
ATTORNEY/AGENT INFORMATION: 
NAME: GOLDBERG, JULES E. 
REGISTRATION NUMBER: 24,408 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-986-4 090 
TELEFAX: 212-818-947 9 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 28804 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : unknown 
TOPOLOGY: unknown 
MOLECULE TYPE: DNA (genomic) 
FRAGMENT TYPE: N-terminal 
US-08-592-874-1 

Query Match 2.4%; Score 48.6; DB 2; Length 28804; 

Best Local Similarity 46.6%; Pred. No. 0.0074; 

Matches 156; Conservative 0; Mismatches 179; Indels 0; Gaps 0; 

Qy 481 CT GAC C GT CAGAGAGACCCT GGCT T T C ATT GC C CAGAT GCGCCTGCC CAGGAC CTT CT CC 540 

III I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 19377 CTGTTCAGCCGCTCGATCCGCGAGAACATTGCGCTGTCCAACCCGGCGATGCCGTTCGAG 19318 

Qy 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

Mill I I I I I I I II I I I I I I I I I I 

Db 19317 CATGTCGTGGCGGCGGCGACGCTGGCGGGTGCGCATGACTTCATCCTGCGTCAGCCGCGC 19258 

Qy 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

III II II I I I I I I I I I I I I I I I II I II 

Db 19257 GGCTATGACACCGAGATCGTCGAGCGCGGCGTCAACCTGTCGGGCGGCCAGCGCCAGCGG 19198 

Qy 661 GT GAGCAT T GGGGT GCAGCT C CT GT GGAAC C CAGGAAT C CT CATT CT GGAT GAACC CACT 72 0 

I III III I I I I I I I I I I I I I I I I I I III 

Db 19197 CTCGCTATCGCCCGCGCGCTGGTCGGCAATCCGCGCATCCTGGTGTTCGACGAGGCGACC 19138 

Qy 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

111 MM I II I I III I I I I I I I 

Db 19137 TCCGCGCTGGATGCCGAGAGCGAGGAGCTGATCCAGAACAATCTGCGCGCCATCTCGGCG 19078 

Qy 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCC 815 

III I I I I I I I I I I I I I I I II I I I I I I 
Db 19077 GGCCGCACGCTGGTGATCATCGCCCACCGCCTGTC 19043 



Search completed: February 26, 2004, 09:45:36 
Job time : 114.675 sees 
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/cgn2_6/ptodata/2/pubpna/PCTUS_PUBCOMB.seq:* 
/cgn2_6/ptodata/2/pubpna/US08_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US08_PUBCOMB. seq:* 
/cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB. seq: * 
/cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB. seq: 4 
/cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB. seq: v 
/cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB.seq: v 
/cgn2_6/ptodata/2/pubpna/US10B_PUBCOMB.seq: v 
/cgn2_6/ptodata/2/pubpna/USlOC_PUBCOMB.seq: ' 
/cgn2_6/ptodata/2/pubpna/US10_NEW_PUB.seq:* 
/cgn2_6/ptodata/2/pubpna/US60_NEW_PUB.seq:* 
/cgn2_6/ptodata/2/pubpna/US60_PUBCOMB.seq: * 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 

US-09-989-981A-3 

; Sequence 3, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION : 

; APPLICANT: Hobbs, Helen H. 



APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patentln Ver . 2.1 
SEQ ID NO 3 
LENGTH: 2019 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (2019) 

OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-3 

Query Match 100.0%; Score 2019; DB 10; Length 2019; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2019; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 AT GGCT GAGAAAAC CAAAGAAGAGACC CAGCT GT GGAAT GGGACT GT ACT T CAGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 AT GGCT GAGAAAAC CAAAGAAGAGAC C CAGCT GT GGAAT GGGACT GTACTT CAGGAT GCT 60 

Qy 61 T C GGGC CT CCAGGACAGCTT GT T CT C CT CGGAAAGT GACAAC AGTCT GT ACTT CAC CT AC 120 

I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I II I I 
Db 61 TCGGGCCT CCAGGACAGCTT GTT CTCCTCGGAAAGT GACAACAGTCTGTACTTCACCTAC 120 

Qy 121 AGT GGT CAGT CCAACACTCT GGAGGTCAGAGATCTCACCTACCAGGT GGACAT CGCCTCT 180 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 AGT GGT CAGT C CAAC ACT CT GGAGGT CAGAGAT CT CAC CTAC C AGGT GGACAT CGCCT CT 180 

Qy 181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 240 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 24 0 

Qy 241 CAAGACT CCTGT GAGCTGGGCAT CCGAAAT CTAAGCTT CAAAGT GAGGAGTGGACAGAT G 300 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 241 CAAGACT C CT GT GAGCT GGGCAT C C GAAAT CTAAGCTT CAAAGT GAGGAGT GGACAGAT G 300 

Qy 301 CT GGCCAT CATAGGGAGCT CAGGCT GCGGGAGAGCCTCACTACTCGACGTGAT CACAGGC 360 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 CTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGC 360 

Qy 361 AGAGGC CAC GGT GGCAAGAT GAAAT CAGGACAAATTT GGATAAAT GGGCAACCCAGTACG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I II 
Db 361 AGAGGCCAC GGT GGCAAGATGAAAT CAGGACAAATTTGGATAAAT GGGCAACCCAGTACG 420 



Qy 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 4 80 

Qy 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I II I I I M I I I I I I I I I I I I I I I I I I 

Db 481 CT GAG CGT C AGAGAGAC C CT GGCT T TCAT T GCC CAGAT GC GC CT GC C CAGGAC CTT CT C C 540 

Qy 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

Qy 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 601 GC CAACAC CAGAGT GGGCAAC AC GT AT GT AC GT G GGGTGT C C GGGGGT GAGC GC C GAC GA 660 

Qy 661 GT GAGCAT T GG GGT GCAGCTC CT GT GGAAC C CAG GAATC CT CAT T CT GGATGAAC C C ACT 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 661 GTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACT 720 

Qy 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 

Db 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCTKAG 7 80 

Qy 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 840 

Qy 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

Qy 901 GT GCAGTACT T CACAT C CATT GGC CAC C CT T GT C CT C GCT AT AGCAAC C CTGC GGACT T C 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 901 GT GCAGTACT T CACAT C CATT GG C CAC C CT T GT C CT C GCT AT AGCAAC C CT GCGGACTT C 960 

Qy 961 TAC GT GGACTT GACCAGCAT C GAC AGAC GCAGCAAAGAAC GGGAGGT GGCCAC C GT GGAG 1020 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 961 TAC GT GGACT T GAC C AGCAT C GAC AGAC GCAGCAAAGAAC GGGAGGT GGC CAC C GT GGAG 1020 

Qy 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAG7\AA7VAGTAC7^AGGCTTTGATGACTTTCTG 1080 

I I I II I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I 

Db 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

Qy 1081 T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C CACAC AGT C AGC CT GAC C CT CACA 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I II I I I I I 

Db 1081 T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCAC C CAC ACAGT C AGC CT GAC C CT CACA 1140 

Qy 1141 CAGGACAC T GACTGT GGGACT GCT GTT GAG CT GC CC GGGAT GAT AGAGCAGT TT T C CAC C 1200 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1141 CAG GACACT GACTGT GGGACT GCT GT T GAGCT GC CC GGGAT GAT AGAGCAGT TT T C CAC C 1200 

Qy 1201 CTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGG 1260 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1201 CT GAT CC GT C GTCAGAT TT C CAAT GACT T C CGGGAC CT GC C CACGCT GCT C AT T CAT G G G 1260 



Qy 1261 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 1320 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1261 TCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAG 1320 

Qy 1321 CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 1380 

M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1321 CAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTC 1380 

Qy 1381 AAT GT CAT CCT GGAT GT CGT CT CCAAAT GT C ACTCGGAGAGGT CAAT GCT GT ACT AT GAG 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1381 AAT GT CAT CCT GGAT GT C GT CT CCAAAT GT C ACTC G GAGAGGT CAAT GCT GT ACT AT GAG 1440 

Qy 1441 CTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTG 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 14 41 CTGGAAGACGGGCT GTAC ACT GCT GGT C CTT AT TT CTT T GC CAAGAT CCTAGGAGAATT G 1500 

Qy 1501 C CGGAGCACT GT GC CT AC GT CAT CAT CT ACGC GAT GCCCAT CT ACT GGCT GACAAACCT G 1560 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1501 CCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACA7UVCCTG 1560 

Qy 1561 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1561 CGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGC 1620 

Qy 1621 TGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTC 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1621 TGCAGGACCATGGCCCT GGCT GCCTCTGCCATGCTGCCCACCTTCCACATGT CCT CCTTC 1680 

Qy 1681 TTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGAC 1740 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1681 T TCT GCAAT GC C CT CT ACAACT C CT T CT ACCT TACT GC C GG CTT CAT GATAAACTT GGAC 1740 

Qy 1741 AAC CT GT G GATAGT GC CT GCAT GGAT CTCCAAGCTGTCGTTCCTCC GGT GGT GCTTCTCG 1800 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 AACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCG 18 00 

Qy 1801 GGGCT GAT GCAGAT T CAATT TAAT GGACACCT TT ACAC CACACAAAT CGGCAACTT CACC 1860 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 1801 GGGCTGAT GCAGAT T CAATTT AAT GGACACCT TT ACAC CACACAAAT CGGCAACT T CAC C 1860 

Qy 1861 T T CT CC AT C CT CGGAGACAC GAT GATCAGT G C CAT GGAC CT GAACT C GCAT C CACT CT AT 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1861 TTCT CCATCCT CGGAGACACGAT GATCAGTGCCAT GGACCT GAACTCGCATCCACTCTAT 1920 

Qy 1921 GCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCC 1980 

I I I I I I I I II I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1921 GC GATC T AC CT CAT T GT CAT C GGC AT CAGCT AC GGCT T CCT GTT C CT GT ACTAT CT AT CC 1980 

Qy 1981 TT GAAGCTCAT CAAACAGAAGT CAAT TCAAG ACT GGTGA 2019 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1981 TT GAAG CT CAT CAAACAGAAGT CAATT CAAGACT GGT GA 2 019 



RESULT 2 

US-09-989-981A-7 

; Sequence 7, Application US/09989981A 
; Publication No. US20030049730A1 
; GENERAL INFORMATION: 



APPLICANT : Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 01878 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2,1 
SEQ ID NO 7 
LENGTH : 2669 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (100) . . (2121) 

OTHER INFORMATION: human ABCG8 (hABCG8) 
US-09-989-981A-7 

Query Match 70.8%; Score 1430; DB 10; Length 2669; 

Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

Qy 1 AT GGCTGAGAAAACCAAAGAAGAGAC C C AGCT GT GGAAT GGGACT GTACTT C AGGAT GCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 100 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 159 

Qy 61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 120 

I 1 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 160 T CGGGCCT CC AGGAT AGATTGTT CTCCT CT GAAAGTGACAACAGCCTGTACTT CACCTAC 219 

Qy 121 AGT GGTCAGT C CAACACT CT GGAGGT C AGAGAT CT C AC CT AC C AGGT GGAC AT C GC CT CT 180 

I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINI 
Db 22 0 AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 279 

Qy 181 CAGGT GCCTT GGT TT GAGC AGCT GGCT C AGTT CAAGAT AC C CT GGAGGT CT CAT AGCAGC 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I III II 

Db 28 0 CAGGT CCCTTGGTTTGAGCAGCT GGCT CAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

Qy 241 CAAGACT C CT GT GAG CT GGGC AT C CGAAAT CTAAGCT T CAAAGT GAGGAGT GGACAGAT G 300 

II I II I I I I I I I I I I I I I I I I II I I II I I I I I I II I I I I I I I I I I I I I I I I 

Db 34 0 CAGAATT CTT GT GAG CT G GGC AT C CAGAAC CTAAGCTT CAAAGT GAGAAGT GGGCAGAT G 399 

Qy 301 CT G GC CAT C AT AGGGAGCT CAGGCT GC GGGAGAGCCT CACT ACT CGAC GT GAT CACAGGC 360 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I II II I I I I I I I I III 
Db 4 00 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 4 59 

Qy 361 AGAGGCCAC G GT GGCAAGAT GAAAT CAGGACAAATTT GGAT AAATGGGCAAC C CAGT AC G 420 

I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I I II I I I I I I I I I II 
Db 460 CGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 519 



Qy 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAA^ 48 0 

II I I I I I I I I I I I I I I I I I I M II II I I I I I I I I I I I I I I I I I I I I I I I M 

Db 520 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 57 9 

Qy 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 580 T T GACT GT GC GAGAGAC CT T GGC CT T CAT T GCC CAGAT GC GGCT GCC CAGAAC CT T CT C C 639 

Qy 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 640 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

Qy 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II 

Db 700 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

Qy 661 GT GAGC AT TGGGGT GCAGCT C CT GT GGAACCCAGGAAT C CT CAT T CT GGAT GAAC CC ACT 72 0 

II I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 760 GT CAGCAT T GGGGT GCAGCT C CT GT GGAAC CC AGGAAT C CTT ATT CT CGAC GAAC C CACC 819 

Qy 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I MINIMI I I I I I I I I I 
Db 820 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 879 

Qy 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I II I I I I I I II I I III 

Db 880 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

Qy 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 940 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

Qy 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

II I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

Qy 961 T AC GT GGACT T GACC AGC AT C GACAGAC G CAGCAAAGAAC GGGAGGT GGC C ACCGT GGAG 1020 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I II I I I I I I I I I I I I I 
Db 1060 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1119 

Qy 1021 AAGGCACAGTCT CTT GCAGCCCT GTT CCTAGAAAAAGT ACAAGGCTTT GAT GACTTT CTG 1080 

I I I I I I I I I I II I I I I I I I I I I I II I I II I I I I I I I III I I I I I I I I I I I 
Db 112 0 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTA 117 9 

Qy 1081 T GGAAAGCT GAGGCAAAGGAACT CAACACAAGCACCCACACAGTCAGCCT GACCCT CACA 1140 

I I I II I I I I I I I I I I I I I I M I I I I I III III 

Db 118 0 T GGAAAGC AGAGAC GAAGGAT CTT GACGAGGAC ACCT GT GT GGAAAGC AGCGT GAC C CC A 1239 

Qy 1141 CAGGACACT GACT G T GGGACT GC T GT T GAGCT GC C C GGGAT GATAGAGCAGT T T T CC 1197 

I Mill I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

Db 124 0 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

Qy 1198 ACC CT GAT C C GT CGT CAGAT T T C CAAT GACT T C CGGGAC CTGC C CAC GCT GCT CAT T CAT 1257 

II I I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I I I I I I I II II I I I III 

Db 1300 AC GCT GAT C C GT CGT CAGAT T T C CAAC GACT T C C GAGACCT GC C CAC C CT CCT CAT CC AT 1359 



Qy 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

I I I I I I I I I I I I I I I I I I I I J I I I I I I I I I I I I I I I I I I I II I I I I 
Db 1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

Qy 1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II II I I I I I III 
Db 1420 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

Qy 137 8 TT CAAT GT CAT C CT GGAT GT CGT CT CCAAAT GT C ACT C GGAGAGGT CAAT GCT GT ACT AT 1437 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1480 TT CAAC GT CAT T CT GGAT GT CAT CT CCAAAT GTT ACT CAGAGAGGGCAAT GCTTT ACT AT 1539 

Qy 14 38 GAGCT GGAAGAC GGGCT GT ACACT GCT GGT C CTT AT T T CTT T G C CAAGAT CCT AGGAGAA 1497 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II 

Db 1540 GAACT GGAAGAC GGGCT GT ACAC CACT GGT C CAT ATT T CTT T GC CAAGAT CCT C GGGGAG 1599 

Qy 14 98 TT GC C GGAGCACT GT G C CT AC GT CAT CAT CTACGC GAT GCC C AT CT ACT G GCT GACAAAC 1557 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 1600 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

Qy 1558 CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

M I II I I I I III I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1660 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

Qy 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

II I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I II I I I I II I I I I I 

Db 1720 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1779 

Qy 1678 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 1780 TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

Qy 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

II I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II I I I I II 

Db 184 0 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 

Qy 1798 T C GGGGCT GAT GCAGAT T CAAT T T AAT GGACACCT T T ACAC C ACACAAAT CGGCAACTT C 1857 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 1900 GAAGGGCT GAT GAAGAT T CAGT T CAGCAGAAGAACT T AT AAAAT GCCT CT CG GGAACCT C 1959 

Qy 1858 AC CT T CTC C AT C CT C GGAGACAC GAT GAT C AGT GC C ATGGAC CT GAACT C GCAT C CACT C 1917 

M I I I I II I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1960 AC CAT CGC G GT CT CAG GAGAT AAAAT CCT CAGT GCCAT G GAGCTGGACT CGT AC C CTCT C 2019 

Qy 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

I I I I II II I I I I I I I I I I I I III I I I I I II I I I I II I I I I I I I I I I I 
Db 2020 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2079 

Qy 197 8 T CCTT GAAGCT CAT CAAACAGAAGT CAAT T CAAGACT GGT GA 2019 

I II I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
Db 2080 TCCTTAAGGTTCAT CAAACAGAAACCAAGT CAAGACT GGTGA 2121 



RESULT 3 

US-10-415-378-29 

; Sequence 29, Application US/10415378 
; Publication No. US20040014945A1 



GENERAL INFORMATION: 
APPLICANT: INCYTE CORPORATION; TANG, Y. Tom 
APPLICANT: YUE, Henry; NGUYEN, Danniel B-; 
APPLICANT: HAFALIA, April J. A. ; ELLIOTT, Vicki S-; 
APPLICANT: LU, Yan; CHAWLA, Narinder K. ; 
APPLICANT: YAO, Monique G. ; BAUGHN, Mariah R. ; 
APPLICANT: GANDHI, Ameena R. ; DING, Li; 

APPLICANT: SANJANWALA, Madhusudan M. ; RAMKUMAR, Jayalaxmi; 
APPLICANT: ARVIZU, Chandra S.; GIETZEN, Kimberly J.; 
APPLICANT: LAL, Preeti G. ; AZIMZAI, Yalda; 
APPLICANT: KHAN, Farrah A. ; THAN GAVE LU , Kavitha; 
APPLICANT: THORNTON, Michael B. ; LU, Dyung Aina M. ; 
APPLICANT: TRIBOULEY, Catherine M. ; WARREN, Bridget A. ; 
APPLICANT: ISON, H. Craig; DAS, Debopriya; 
APPLICANT: RAUMANN, Brigette E. ; POLICKY, Jennifer L.; 
APPLICANT: KEARNEY, Liam 

TITLE OF INVENTION: TRANSPORTERS AND ION CHANNELS 
FILE REFERENCE: PI-0270 USN 
CURRENT APPLICATION NUMBER: US/10/415, 378 
CURRENT FILING DATE: 2003-05-07 
PRIOR APPLICATION NUMBER: PCT/US01/46055 
PRIOR FILING DATE: 2001-10-27 
PRIOR APPLICATION NUMBER: US 60/250,790 
PRIOR FILING DATE: 2000-12-01 
PRIOR APPLICATION NUMBER: US 60/252,232 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/249,661 
PRIOR FILING DATE: 2000-11-17 
PRIOR APPLICATION NUMBER: US 60/247,673 
PRIOR FILING DATE: 2000-11-09 
PRIOR APPLICATION NUMBER: US 60/245,904 
PRIOR FILING DATE: 2000-11-03 
PRIOR APPLICATION NUMBER: US 60/243,989 
PRIOR FILING DATE: 2000-10-27 
NUMBER OF SEQ ID NOS : 40 
SOFTWARE: PERL Program 
SEQ ID NO 29 
LENGTH: 3239 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/ KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. US20040014945A1 6585710CB1 
US-10-415-378-29 

Query Match 36.8%; Score 743.8; DB 15; Length 3239; 

Best Local Similarity 78.9%; Pred. No. 7.1e-221; 

Matches 899; Conservative 0; Mismatches 237; Indels 3; Gaps 1; 

Qy 884 GGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATA 943 

Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 12 GGGGCGGCCAGCACATGGTCCATTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 71 

Qy 944 GCAAC C CT GCGGACT T CT AC GT GGACT T GACCAGCAT CGACAGACGCAG CAAAGAAC GGG 1003 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I III I II 
Db 72 GCAAT CCTGCT GACTT CTAT GT GGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 131 



Qy 1004 AGGT GGC CAC C GT GGAGAAGGCAC AGT C T CT T G C AGC C CT GTT C CTAGAAAAAGT ACAAG 1063 

I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 132 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 191 

Qy 1064 GCTTT GAT GACTTTCT GTGGAAAGCTGAGGCAAAGGAACT CAACACAAGCACCCACACAG 1123 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 192 ACT TAGAT GACT T T CTATGGAAAGCAGAGAC GAAGGAT CT T GAC GAGGACAC CT GT GT GG 251 

Qy 1124 T CAGC CT GAC C CT C ACACAGGAC ACTGACT G TGGGACTGCTGTTGAGCTGCCCGGGA 1180 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 252 AAAGCAGC GT GAC C C CACT AGAC AC CAACT GC CTC C C GAGT CCT AC GAAGAT GC CT GGGG 311 

Qy 1181 TGATAGAGCAGTT T T C CAC C CT GAT CC GT C GT CAGAT TT C CAAT GACTT CCGGGAC CT GC 124 0 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 312 CGGT GC AGCAGTT T AC GACGCT GAT CC GT C GT CAGAT TT CCAAC GACTT CCGAGAC CT GC 371 

Qy 1241 CCACGCTGCTCATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTT 1300 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I 
Db 372 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 431 

Qy 1301 ACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGA 1360 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 432 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 4 91 

Qy 1361 TAGGGGCGCTCATTCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGA 1420 

I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 492 T C GGT GCTCT CAT C C CTTT CAAC GT CAT T CT GGAT GT CAT CTCCAAAT GTT ACT CAGAGA 551 

Qy 1421 GGTCAATGCTGTACTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTG 14 8 0 

II I I I I I I I I I II I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 552 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 611 

Qy 1481 C CAAGAT CCT AGGAGAATT GCC GGAGC ACT GT GCCT AC GT CAT CAT CT ACGCGAT GCCCA 1540 

I I I I I I I I I I II II I I I I I I I I I I I I It I I I I II I I I I I I I I I I I I I I I II I 
Db 612 C CAAGAT CCT CGGGGAGCT T C C GGAGCACT GT GCCT ACAT CAT CAT CTACGGGAT GCCCA 671 

Qy 1541 TCTACTGGCTGACAAACCTGCGGCCCGT GCCT GAGCTCTTCCTTCTACACTT CCT GCTCG 1600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 672 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 731 

Qy 1601 TGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCA 1660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III I III I I I I I I I I 
Db 732 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 791 

Qy 1661 CCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCG 1720 

I I I I I I I I I I I I I I I I I I I I II I I I I I M I I I II I I I I I I I I I II I I I I I I I I 
Db 792 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 851 

Qy 1721 GCTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGT 17 80 

I I I I I II I I I I I I I I I I II I I I II I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 852 GCT T CAT GATAAACT T GAGCAGC CT GT GGACAGT GC C C G CGT GGATTT CCAAAGT GTC CT 911 

Qy 1781 TCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCA 1840 

II I I I I I I I I I I I I I I I I I I I M I I I I I I I II I II Ml I I 

Db 912 T C CT GC GGT GGT GTTTT GAAGGGCT GAT GAAGAT T C AGT TCAGCAGAAGAACT T AT AAAA 971 



Qy 



1841 C ACAAAT C GGCAACTT C AC CTT CT C CAT C CT CGGAGACACGAT GAT CAGT GC C AT GGAC C 1900 



Db 


Q79 
_? / <l 


TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 1031 




1 Q01 


T GAACT C GCAT C C ACT CT AT GC GAT CT AC CT CAT T GT CAT C GGCAT CAGCT AC GGCT T C C 1960 




II Mill 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 II III 1 1 II 1 MINI 


Db 


1032 


TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 1091 


Qy 


1961 


T GT T C CT GTACT AT CT AT C CT T GAAGCT CAT CAAACAGAAGT CAATT CAAGACTG GT GA 2019 




|| M 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 


Db 


1092 


T GGT CCT GTACT AC GT GT C CT T AAGGT T CAT CAAACAGAAAC CAAGT CAAGACT GGT GA 1150 



RESULT 4 
US-09-837-992-4 

; Sequence 4, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
; TITLE OF INVENTION: and Methods of Use 
; FILE REFERENCE: 01878 1-00602 0US 
; CURRENT APPLICATION NUMBER: US/09/837,992 
; CURRENT FILING DATE: 2001-04-18 

PRIOR APPLICATION NUMBER: US 60/198,465 
; PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 4 

LENGTH: 2340 

TYPE: DNA 
; ORGANISM: Homo sapiens 
; FEATURE: 

OTHER INFORMATION: human sitosterolemia gene (SSG) 

NAME/ KEY: CDS 

LOCATION: ( 107) (2062) 
; OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
; OTHER INFORMATION: protein 
US-09-837-992-4 

Query Match 9.9%; Score 199.2; DB 9; Length 2340; 

Best Local Similarity 54.0%; Pred. No. 6.1e-51; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1; 

Qy 234 TAGCAGCCAAGACT CCT GT GAGCTGGGCAT CCGAAATCTAAGCTT CAAAGTGAGGAGT GG 293 

I M I II II III III I I I I Ml I I II I II I I 

Db 283 TT GCCGGCAGCAGT GGAC CAGGCAGAT C CT CAAAGAT GT CT C CTT GT AC GT GGAGAGC GG 342 

Qy 294 ACAGATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGAT 353 

I I I I I || I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I M II 
Db 343 GCAGATCAT GT GCAT CCT AGGAAGCT CAGGCT CCGGGAAAACCACGCT GCTGGACGCCAT 402 



Qy 



354 CACAGGCAGAGGCCACGGT GGCAAGAT GAAAT CAGGACAAATTTGGATAAAT GGGCAACC 413 



Db 


403 


GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 


462 


Qy 


414 


CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 

GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 


473 


Db 


463 


522 


Qy 


474 


GCCCAAC CT GAC CGT CAGAGAGAC CCTGGCTTT CATT G C C CAGAT GC G C CT GC CCAG GAC 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II III II 1 1 1 1 1 1 1 1 1 
GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 


533 


Db 


523 


582 


Qy 


534 


CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 
| || || I III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
CAATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCT GAGTCT GAG 


593 


Db 


583 


639 


Qy 


594 


GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 

|| || || 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 
CCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCG 


653 


Db 


640 


699 


Qy 


654 


C C GAC GAGT GAGC AT T GGGGT GC AGCT CCT GT GGAACC CAGGAAT C CT CAT T CT GGAT GA 

I I 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 III 1 1 

GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 


713 


Db 


700 


759 


Qy 


714 


ACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCT 

1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 M 1 1 Ml Ml II 
GCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACT 


773 


Db 


760 


819 


Qy 


774 


GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 

Ml Mill Mill 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 
GGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCA 


833 


Db 


820 


879 


Qy 


834 


GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 

1 1 II 1 1 1 1 1 1 1 Ml 1 II 1 MM II M 1 
GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 


893 


Db 


880 


939 


Qy 


894 


GCAAAT GGT GCAGT ACT T C ACAT C CAT T GGC C ACC CT T GT C CT CGCT AT AGCAAC C CTGC 
II 1 1 1 II II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 II 1 1 II 1 1 
GGAAAT G CT T GATT T CTT CAAT GACT GCGGT T ACC CTT GT C CT GAACATT CAAACC CTT T 


953 


Db 


940 


999 


wy 




f^ArTTPTArfVrc^ACTT GAC CAGCAT C GACAGAC GC AGCAAAGAAC GGGAGGT GGC CAC 

1 1 II 1 1 II 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 

T GACT T CT ATAT GGAC CT GAC GT CAGT GGAT AC CCAAAGCAAGGAAC GGGAAAT AGAAAC 


1013 


Db 


1000 


1059 


Qy 


1014 


CGT GGAGAAGGCACAG 1029 

1 III 1 III 

CT CCAAGAGAGT C CAG 1075 




Db 


1060 





RESULT 5 

US-09-989-981A-5 

; Sequence 5, Application US/09989981A 

; Publication No. US2 0030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 

; APPLICANT: Tian, Hui 



; APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
; TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
; FILE REFERENCE: 0187 8 1-00732 OUS 
; CURRENT APPLICATION NUMBER: US/09/989, 981A 
; CURRENT FILING DATE: 2002-07-23 

PRIOR APPLICATION NUMBER: US 60/252,235 
; PRIOR FILING DATE: 2000-11-20 
; PRIOR APPLICATION NUMBER: US 60/253,645 
; PRIOR FILING DATE: 2000-11-28 
; NUMBER OF SEQ ID NOS : 13 
; SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 5 

LENGTH: 2340 

TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
; LOCATION: (107) (2062) 

; OTHER INFORMATION: human ABCG5 (hABCG5) 
US-09-989-981A-5 

Query Match 9.9%; Score 199.2; DB 10; Length 2340; 

Best Local Similarity 54.0%; Pred. No. 6.1e-51; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1; 

Qy 234 TAGCAGC CAAGACT C CTGTGAGCTGGGCATCCGAAAT CTAAGCTT CAAAGTGAGGAGTGG 293 

I I I I I I II III III I I I I III I I I I I I I I I 

Db 283 T T GC C GGC AGC AGT GGAC C AGGCAGAT C CT CAAAGAT GT CT CCT T GT AC GTGGAGAGCGG 342 

Qy 294 ACAGATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGAT 353 

I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 343 GCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG7^AACCACGCTGCTGGACGCCAT 402 

Qy 354 CAC AGGC AGAGGC CAC GGT GGCAAGAT GAAAT CAGGACAAATTT GGAT AAAT GGGCAAC C 413 

I I I I I II II I I I I I I I I I I I I 

Db 403 GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 462 

Qy 414 CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 473 

I I I I I I I I I I I I I I I I I I I I I III I I I I I 

Db 463 GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 522 

Qy 474 GC C CAAC CT GAC C GT CAGAGAGAC CCT GGCT T TCAT T GC C CAGAT GC GCCT GC CCAGGAC 533 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 

Db 523 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 582 

Qy 534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

I || II I III I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 583 CAATCCCGGCTCCTTCC AGAAGAAGGT GGAGGC C GT CAT GGC AGAGCT GAGT CT GAG 639 

Qy 594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

II II II I I I I I II I III I I I I I I I I I I I I I 

Db 640 C CAT GT GGCAGAC C GACT GAT T GGCAACT AC AGCT T GGGGGGCAT T T C CAC GGGT GAGC G 699 

Qy 654 C C GAC GAGT GAGC AT T GGGGT GCAGCT C CT GT GGAAC C CAGGAAT CCT CAT T CT GGAT GA 713 

I I I I I I I I I I I I I I I I I I I I II III I I I I I I I 

Db 700 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 759 



Qy 


714 


ACCCACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCT 

1 I 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III Ml II 

GCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACT 


773 


Db 


760 


819 


Qy 


774 


GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 

III 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 III 

GGC T CGCAGGAAC C GAATT GT GGT T CT C AC CATT CAC CAGC CCC GT T CT GAGCT T TTT CA 


833 


Db 


820 


879 


Qy 


834 


GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 

1 1 1 1 1 1 1 1 II 1 Mil II III 

GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 


893 


Db 


880 


939 


Qy 


894 


GCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGC 

I | | M 1 1 1 1 1 1 1 II 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 
GGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTT 


953 


Db 


940 


999 


Qy 


:? D *i 


r r, a r tt r t a r GT GG A C T T G A c C AGC AT C GAC AGAC G CAGCAAAGAACGGGAGGT GGCC AC 

I I 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 11.11 

T GACTTCTATAT GGACCT GACGT CAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAAC 


1013 


Db 


1000 


1059 


Qy 


1014 


CGT GGAGAAGGCACAG 102 9 

1 III 1 III 

CT CCAAGAGAGT CCAG 1075 




Db 


1060 





RESULT 6 

US-09-989-981A-1 

Sequence 1, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT : Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 81-007320US 
CURRENT APPLICATION NUMBER: US/09/ 989 , 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 1959 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (1959) 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-1 



Query Match 9.2%; Score 186.6; DB 10; Length 1959; 

Best Local Similarity 53.1%; Pred. No. 4.8e-47; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 

Qy 261 C ATC C GAAAT CTAAGCT T CAAAGT GAG GAGT GGACAGAT GCT GGC CAT CAT AGGGAGCTC 320 

Ml I I I I III I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 207 C CT CAAAGAT GT CTC CT T GTACAT C GAGAGT GGC CAGAT TAT GT G CAT CTT AGGC AGCTC 266 

Qy 321 AGGCT GC GGGAGAGC CT C ACT ACT CGAC GT GAT CACAGGC AGAGGC CAC GGT GGCAAGAT 38 0 

I I I I I I I I I II 1 Ml I M M I I M 

Db 267 AGGCT CAGGGAAGACCACGCT GCT GGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 326 

Qy 381 GAAAT C AGGACAAAT T T GGATAAAT GGGCAACC CAGTAC GC CT CAGCT GGT GAGGAAGT G 440 

III I II I I I I I I I I I I III 

Db 327 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 386 

Qy 441 CGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTGACCGTCAGAGAGACCCT 500 

II I I I I I I I I I III I I II I I I I I I I II I I I I I I I 

Db 387 CTT CT C CTAC GT CCT GC AGAGC GACGT TT T T CT GAGCAGC CT C ACT GT GC GC GAGAC GTT 44 6 

Qy 501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 

I I I I I I Mill III I I I I I I I I I I I I 

Db 447 GCGATACACAGC GAT GCTGGCCCTCTGCC GCAGCT C C GCGGACTT CT ACAACAAGAA 503 

Qy 561 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 620 

I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I 

Db 504 GGT AGAGGCAGT CAT GACAGAGCT GAGC CT GAGC CAC GT GGCGGAC CAAAT GATT GGCAG 563 

Qy 621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 680 

I II I III I I II I II I I I I I I I I I I I I IN I I II M 

Db 564 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 623 

Qy 681 C CT GT GGAAC C CAGGAAT CCT C ATT CT GGAT GAAC CC ACT TCT GGCCT CGACAGCT T CAC 740 

I M I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 62 4 CCTTCAGGACCCCAAGGTCAT GATGCTAGATGAGCCAACCACAGGACT GGACT GCATGAC 683 

Qy 741 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 800 

I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 68 4 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 743 

Qy 801 CTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGAC 860 

I I I II I I II I I I I I I I I I I II I I II I I I I I I I I I I I I I I 

Db 744 CAC CAT C CAC CAGCCT C GCT CT GAGCT CTT C CAACACT T CGACAAAAT T GCCAT CCTGAC 803 

Qy 861 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 920 

III III I I I I II I I I I I I I I I I I I 

Db 804 TTACGGAGAGTTGGT GTTCT GT GGCACCCCAGAGGAGAT GCTT GGCTT CTTCAATAACT G 863 

Qy 921 TGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCAT 980 

III I I II I I I I I I II I I I II I I I I I I I I III I I I I I I I 

Db 864 T GGTT AC C C CT GT C CT GAACAT T C CAAT CCCT T TGAT T T TT AC AT GGACTT GAC AT CAGT 923 

Qy 981 C GACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT GGAGAAGGCACAGT CTCTT GCAGC 104 0 

I I I I I I I I I I I I I I I I I II II II I II I I I I I I 

Db 924 GGAC ACC CAAAGCAGAGAGC GGGAAAT AGAAAC GT ACAAGC GAGT AC AGAT GCT GGAAT G 983 



Qy 



1041 CCTGTTCCTAGAA 1053 



Ill III 

Db 984 TGCCTTCAAGGAA 996 



RESULT 7 
US-09-837-992-2 

Sequence 2, Application US/09837992 
Patent No. US20020081687A1 
GENERAL INFORMATION: 
APPLICANT: Tian, Hui 
APPLICANT: Schultz, Joshua 
APPLICANT: Shan, Bei 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 018781-006020US 
CURRENT APPLICATION NUMBER: US/09/837,992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 
LENGTH: 2258 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
NAME/ KEY: CDS 
LOCATION: (47) .. (2005) 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-2 

Query Match 9.2%; Score 186.6; DB 9; Length 2258; 

Best Local Similarity 53.1%; Pred. No. 5.1e-47; 

Matches 421; Conservative 0; Mismatches 369; Indels 3; Gaps 1; 

CATCCGAAAT CTAAGCTT CAAAGTGAGGAGT GGACAGAT GCTGGCCATCATAGGGAGCT C 320 
Ml | | | | I I I I I I I I I I I I I I I I M I I II I I I I I I I I I 
CCT CAAAGAT GT CT C CTT GT ACAT C GAGAGT GGC CAGAT TAT GT GC AT CT T AGGCAGCT C 312 

AGGCT GC GGGAGAGC CT C ACT ACT C GACGT GAT C ACAGGCAGAGGC CAC GGT GGCAAGAT 380 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

AGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 372 

GAAAT C AGGACAAAT TT GGAT AAAT GGGCAACC C AGT AC GCCT CAGCTGGT GAGGAAGTG 440 

imiii i 1 1 1 1 1 i i i i in 



II I I I I I I I I I III I I I I M Ml II II MUM I 

CTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTT 492 
501 GGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACG 560 



Qy 


261 


Db 


253 


Qy 


321 


Db 


313 


Qy 


381 


Db 


373 


Qy 


441 


Db 


433 


Qy 


501 



i i ii ii 111 mi ii i mi 

Db 4 93 GCGATACACAGC GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAA 549 

Qy 5 61 GGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAA 62 0 

I I I I I I I I I I I I I I I II I I II I I I II II I I I I I I 

Db 550 GGTAGAGGCAGT CAT GACAGAGCT GAGCCTGAGCCACGT GGCGGACCAAATGATT GGCAG 609 

Qy 621 CACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCT 68 0 

| Ml Ml I I I I I I I I I I I I I I I I I I I I I I I I M M 

D b 610 CTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACT 669 

Qy 681 CCT GT GGAAC C CAGGAAT C C T CAT T CT G GAT GAAC CCACT TCTGGCCT CGAC AGCT T C AC 740 

Ml | | | | I M I I I I I I M I I I I I I I I I I I I I I I I I I I 

Db 670 C CT T CAGGAC C C CAAGGT CAT GAT GCT AGAT GAGC CAAC CACAGGACT GGACT GCAT GAC 729 

Qy 7 41 AGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCAT 800 

| | | I I I I I I II I I I I I I I I I I I I I I I I 

Db 730 TGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGT 7 89 

Q y 801 CTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGAC 860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I 

Db 7 90 CAC CAT C CAC CAGC CT C GCT CT GAGCT CTT C CAACACT T CGACAAAAT T GC CAT C CTGAC 849 

Qy 861 ATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCAT 920 

IN III I I I I I I I I I I I I M I I I I 

D b 8 50 TTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTG 909 

Q y 921 TGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCAT 980 

Ml | | M I I I I I I M Mill II II Ml II Ill I 

Db 910 T G GT T AC CCCTGTCCT GAAC AT T C C AAT C C CT T T GAT T T T T AC AT G GAC T T GAC AT C AGT 969 

Qy 981 CGACAGACGCAGCAAAGAACGGGAGGT GGCCACCGTGGAGAAGGCACAGT CTCTT GCAGC 1040 

I I I I I I I I I I I I I I I I I M M II I I I I I I I I I 

Db 970 GGACAC CCAAAGCAGAGAGC GGGAAAT AGAAAC GT ACAAGCGAGT AC AGAT GCT GGAAT G 1029 

Qy 1041 CCTGTTCCTAGAA 1053 

III II I 
Db 1030 TGCCTTCAAGGAA 1042 



RESULT 8 

US-10-425-114-32175 

; Sequence 32175, Application US/10425114 

; Publication No. US20040034888A1 

; GENERAL INFORMATION: 

; APPLICANT: Liu, Jingdong 

; APPLICANT: Zhou, Yihua 

; APPLICANT: Kovalic, David K. 

APPLICANT: Screen, Steven E 
; APPLICANT: Tabaska, Jack E 
; APPLICANT : Cao, Yongwei 

; TITLE OF INVENTION: Nucleic Acid Molecules and Other Molecules Associated 
With 

; TITLE OF INVENTION: Plants and Uses Thereof for Plant Improvement 

; FILE REFERENCE: 38-21 ( 53313 ) B 

; CURRENT APPLICATION NUMBER: US/10/425, 114 

; CURRENT FILING DATE: 2003-04-28 



NUMBER OF SEQ ID NOS : 7312 8 
SEQ ID NO 32175 
LENGTH: 2585 
TYPE : DNA 
ORGANISM: Zea mays 
FEATURE : 

OTHER INFORMATION: Clone ID: UC-ZMFLB73274A02_FLI 
US-10-425-114-32175 

Query Match 8.4%; Score 169.2; DB 12; Length 2585; 

Best Local Similarity 53.4%; Pred. No. 1.5e-41; 

Matches 382; Conservative 0; Mismatches 328; Indels 6; Gaps 1; 

Qy 269 AT CTAAGCTT CAAAGT GAGGAGT GGACAGAT GCT GGCCAT CATAGGGAGCTCAGGCT GCG 328 

| | | I I Ml I II III M I I I I I M I I I I I I I I 

Db 582 AGCTCACCGGGTACGCGGAGCCCGGGTCGCTGACCGCGCTCATGGGGCCCTCGGGGTCCG 641 

Q y 329 GGAGAGC CT C ACTACTC GAC GT GAT CACAGG C AGAGGC CAC GGT GGCAAGAT GAAAT CAG 388 

II I I I I I I I I I I I I I I I I I I I I M I I Ml 

Db 642 GCAAGTCCACCCTGCTCGACGCCCTCGCCGGCCGCCTCGCCGCCAACGCCTTCCTCTCCG 7 01 

Qy 389 GACAAAT T T GGAT AAAT GGGCAACC CAGT AC GCCT CAGCTGGT GAGGAAGT GC GTT GC GC 44.8 

III I I I I I I I I I I I I I I I I I I I 

Db 702 GCAACGTGCTCCTCAACGG CCGCAAGGCCAAGCTCTCCTTCGGCGCCGCGGCGT 755 

Qy 449 AT GT G C GG CAGCAT GAC CAACT GCT GC C CAAC CT GAC C GT CAGAGAGAC CCT GGCTTT CA 508 

| I I I | | | I I I I I I I I I I I I M II I I I I I I M II II 

Db 756 ACGT GAC G CAGGAC GACAAC CT GAT CGGGAC GCT GAC GGT GC GCGAGAC GAT C GGCT ACT 815 

Qy 509 TTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAG 568 

II II I I MM I I I I I 

Db 816 CGGCGCTGCTGCGGCTGCCGGACAAGATGCCGCGGGAGGACAAGCGCGCGCTGGTGGAGG 875 

Qy 569 ACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCAACACGTATG 628 

| | | | I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M 

Db 876 GCACCAT CGT CGAGAT GGGGCTGC AGGACT G C GC C GAC AC C GT CAT C GGCAACT GGCACC 935 

Q y 629 TACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGA 688 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 936 TCCGCGGGGTCAGCGGCGGCGAGAAGCGCCGCGTCAGCATCGCGCTCGAGCTACTCATGC 995 

Q y 689 ACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCACAGCCCACA 74 8 

M | | II I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I 

Db 996 GCCCGCGCCTCCTCTTCCTCGACGAGCCCACCAGCGGCCTCGACAGCTCGTCTGCGTTCT 1055 

Q y 749 ATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCC 808 

|| | || II I I I I I I I I I I I MM II II I I II I II 

Db 1056 TCGTGACGCAGACGCTGCGGGGCCTGGCGAGGGACGGCAGGACGGTGATTGCTTCCATCC 1115 

Qy 809 ACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGACATCTGGCA 868 

I I I I I M M II I I M II I I I I M I I II I I M I I MM 

Db 1116 ACCAGCCCAGCAGCGAGGTGTTCGAGCTCTTCGACATGCTCTTCCTGCTATCCGGGGGCA 1175 

Q y 869 CCCCTATCTACCTGGGGGCGGCGCAGC7WVTGGTGCAGTACTTCACATCCATTGGCCACC 928 

II I M M II I I M I M II 

Db 1176 AGACCGTCTACTTCGGACAAGCATCGCAAGCATGCGAGTTCTTTGCTCAAGCCGGTTTCC 1235 



Qy 92 9 CTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCATCGAC 984 

I I I I I I I I I I I I I I I I I I I I Ml IN 

Db 1236 CTTGCCCGGCTCTGCGGAATCCGTCCGACCATTTCCTGAGGTGCGTCAACTCGGAC 1291 



RESULT 9 

US-09-866-866A-13 

Sequence 13, Application US/09866866A 
Patent No. US20020102244A1 
GENERAL INFORMATION: 
APPLICANT: Sorrentino, Brian 
APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
FILE REFERENCE: 134 0-1-Q21CIP2 
CURRENT APPLICATION NUMBER: US/09/866, 866A 
CURRENT FILING DATE: 2001-08-30 
PRIOR APPLICATION NUMBER: 09/584,586 
PRIOR FILING DATE: 2000-05-31 
PRIOR APPLICATION NUMBER: PCT/US99/11825 
PRIOR FILING DATE: 1999-05-27 
PRIOR APPLICATION NUMBER: 60/086,988 
PRIOR FILING DATE: 1998-05-28 
NUMBER OF SEQ ID NOS : 27 
SOFTWARE: Patentln version 3.0 
SEQ ID NO 13 
LENGTH: 2025 
TYPE: DNA 

ORGANISM: Mus raus cuius 
US-09-866-866A-13 

Query Match 6.8%; Score 137; DB 9; Length 2025; 

Best Local Similarity 52.3%; Pred. No. 1.5e-31; 

Matches 352; Conservative 0; Mismatches 315; Indels 6; Gaps 2; 

Qy 304 GCCATCAT AGGGAGCT C AGGCT GC GGGAGAGC CT CACT ACT C GAC GT GAT CACAGGC AGA 363 

I I I I IN I I II I I I I I I Ml I I I I I I I Ml I 
Db 245 GCTATTCTGGGACCCACAGGCGGAGGCAAGTCTTCGTTGCTAGATGTCTTAGCAG CA 301 

Qy 364 GGC CAC G GT GGCAAGAT GAAAT C AGGACAAAT T T G GATAAAT GGGCAAC C C AGT AC GCCT 423 

I | I I IN II I I I I I MINIMUM III I III 
Db 302 AGGAAAGAT CCAAAGGGATT ATCT GGAGAT GTTTT GATAAAT GGAGCACC TCAACCT 358 

Qy 424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 483 

| | | II I I I I II II I I I I I I I I I I I I I I I 

Db 359 GC C CAT T T CAAAT GCT GT T C AGGT TAT GT GGTT CAAGAT GAC GTT GT GAT GGGCACC CT G 418 

Qy 484 ACC GT C AGAGAGAC CCT GGCT TT CAT T GC C CAGAT GCGCCT GC C CAGGAC CTT CT CC CAG 543 

I II I I I I II M I I I I I I I I II I I 

Db 419 AC AGT GAGAGAAAACTT AC AGTT CT CAGCAGCT CTT C GACT T C CAACAACT AT GAAGAAT 478 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I I I I I I III I I I I I I I I I M 

Db 47 9 CAT GAAAAAAAT GAAC GGAT T AACACAAT CAT TAAAGAGT T AGGT CT GGAAAAAGT AGCA 538 

Qy 604 AACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTG 663 

I || I I I I I II I I II I I I I I I I I II I I I M 

Db 539 GATT CT AAGGT C GGAACT C AGTT TAT C CGT GG C AT CT CT G GAGGAGAAAGAAAAAGGACA 598 



Qy 664 AGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCT 723 

Mill III II I I I I I MM I I II II I I II I I I I I M II I II 

Db 599 AGCATAGGGAT GGAGCT GAT CACT GAC C CT T C CAT CCT CTT CCT GGAT GAGCCCAC GACT 658 

Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 783 

II I II I I I II II I II I III I 111 f I 

Db 659 GGT TT GGACT CAAGCACAGC GAAT GCTGTCCTTTT GCTC CT GAAAAGGATGTCT AAACAG 718 

Qy 7 84 AACAGGCTGGT GCT CAT CT CCCT C CAC CAGC CT C G CT CT GACAT CTT CAGGCTAT T T GAC 843 

I I II I II M I I I M M M I I I I M II II I I I II I I II 

Db 719 GGT C GAACAAT CAT CTT CT CCAT T CAT CAGC CT C GGT AT T C CAT CTT TAAGTTGT T T GAC 778 

Qy 844 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

I I I M I II M I I MM Ml I II Ml I M 

Db 779 AGCCTCACCTTACTGGCTTCCGGGAAACTCGTGTTCCATGGGCCAGCACAGAAGGCCTTG 838 

Qy 904 CAGT ACTT CACAT C CAT T GGCCACCCTT GT C CT C GCT ATAGCAAC C CT GCGGACT T CT AC 963 

I II II II MM M I II Ml I II I I II II II I II I II II I I 

Db 839 GAGT ACTTT GCAT CAGCAGGTTAC CACT GT GAGC CCTACAACAAC C CT GCGGAT T TTT T C 898 

Qy 964 GT GGACTTGAC CA 976 

III I I II 
Db 8 99 CTTGATGTCATCA 911 



RESULT 10 
US-10-405-806-1 

; Sequence 1, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI , HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 23498 5US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405,806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/08112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2000-303441 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE : Patentln version 3.2 

; SEQ ID NO 1 

LENGTH: 2027 
; TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

NAME/ KEY: CDS 
; LOCATION: (45) (2009) 
US-10-405-806-1 



Query Match 6.6%; Score 132.4; DB 15; Length 2027; 

Best Local Similarity 51.9%; Pred. No. 4e-30; 

Matches 34 8; Conservative 0; Mismatches 316; Indels 6; Gaps 



2 



Qy 304 GCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGCAGA 363 

I I I II I I I I I I II I I I I I I I I I I I I I I I II II I 
Db 273 GCCAT C CT GGGAC CCACAGGT GGAGGCAAAT CTT G GT TATT AGAT G TCTTAGCTGCA 329 

Qy 364 GGCCAC GGT GGCAAGAT GAAAT CAGGACAAAT TT GGATAAAT GGGCAACCCAGTAC GC CT 423 

I III I III III I M I I I I I I I II III I I I 

Db 330 AGGAAAGAT C CAAGT GGAT TAT CT GGAGAT GT T CT GATAAAT GGAGCACCGCGAC CT GC C 38 9 

Qy 424 CAGCT GGT GAGGAAGT G C GT T GCGCAT GT GC GGCAG CAT GACCAACT GCTGCCCAAC CT G 4 83 

I I I I I I II I I I I II I I I I I I I I II III 

Db 390 AATTTCAAATGTAATTCAGGT TACGTGGTACAAGAT GATGT TGT GATGGGCACT CTG 44 6 

Qy 484 ACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAG 543 

I I I I I I I I I I I I III II I I I I I II II I I I 

Db 447 ACGGT GAGAGAAAACT T ACAGT T CT CAGCAGCT CTTCGGCTT GCAACAACT ATGAC GAAT 506 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I MINIMI II II IN I I III I II 

Db 507 CAT GAAAAAAAC GAAC GGATT AACAGGGT CATT CAAGAGT TAGGT CT GGAT AAAGT GGCA 566 

Qy 604 AACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTG 663 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 

Db 567 GACTCCAAGGTTGGAACTCAGTTTATCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACT 62 6 

Qy 664 AGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCT 723 

II M II I I I I II I I I I I I I I I I I I I I I I II II II 

Db 627 AGTATAGGAAT GGAGCTTAT CACT GATCCTTCCATCTTGTT CTT GGAT GAGCCTACAACT 68 6 

Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 7 83 

I I I I I I I I I I I I I I II I III I I I I I I I 

Db 687 GGCTTAGACT CAAGCACAGCAAAT GCTGT CCTTTT GCT CCT GAAAAGGATGT CT AAGCAG 746 

Qy 784 AACAGGCT GGT G CT CAT CT C C CT C CACCAGCCT C GCT CT GACAT CTT C AGGCT ATTT GAC 84 3 

I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I 

Db 747 GGAC GAAC AAT CAT C T T CT C CAT T CAT C AG C CT C GAT AT T C CAT CTT CAAGT T GT T T GAT 806 

Qy 844 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

II I I I I I I I I I I Ml I II M I I I I I II I II 

Db 807 AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTG 866 

Qy 904 CAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTAC 963 

Mill III III II III Mill I II II II I I I I I I II I 

Db 867 GGATACTTTGAATCAGCTGGTTATCACTGTGAGGCCTATAATAACCCTGCAGACTTCTTC 92 6 

Qy 964 GTGGACTTGA 973 

II I M I I 

Db 927 TTGGACATCA 936 



RESULT 11 
US-10-405-806-12 

; Sequence 12, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 



APPLICANT: KOTANI, HIDEHITO 
APPLICANT: NAKAGAWA, RINAKO 

TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 
FILE REFERENCE: 234985US0CONT 
CURRENT APPLICATION NUMBER: US/10/4 05 , 806 
CURRENT FILING DATE: 2003-04-03 
PRIOR APPLICATION NUMBER: PCT/ JP01/ 08 112 
PRIOR FILING DATE: 2001-09-18 
PRIOR APPLICATION NUMBER: JP2000-303441 
PRIOR FILING DATE: 2000-10-03 
NUMBER OF SEQ ID NOS : 17 
SOFTWARE: Patentln version 3.2 
SEQ ID NO 12 
LENGTH: 2053 
TYPE: DNA 

ORGANISM: Artificial Sequence 
FEATURE: 

OTHER INFORMATION: ABCG2 4 82Tmutant sequence 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (32) . . (1999) 
US-10-405-806-12 

Query Match 6.6%; Score 132.4; DB 15; Length 2053; 

Best Local Similarity 51.9%; Pred. No. 4.1e-30; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2; 

Qy 304 GCCATCATAGGGAGCT CAGGCT GCGGGAGAGCCT CACTACT CGACGTGATCACAGGCAGA 363 

I I I I I II I I I I I I I I I I I I I I I I M II I 

Db 260 GCCATCCTGGGACCCACAGGTGGAGGCAAATCTTCGTTATTAGATG— TCTTAGCTGCA 316 

Qy 364 GGC CAC GGT GGCAAGAT GAAAT CAGGACAAAT TT GGATAAAT GGGCAAC CCAGT ACGC CT 423 

| Ml | I I I I I I I II I I I I I I I I I III I I I 

Db 317 AGGAAAGAT C CAAGT GGAT T AT CT GGAGAT GT T CT GATAAAT GGAGCAC CGC GAC CT GC C 376 

Qy 424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 4 83 

|| I I I I I I I I I I II I I I I I I I I I I I I I 

Db 377 AAT TT CAAAT GTAATT CAGGT TAC GTGGTACAAGAT GAT GTT GT GAT GGGC ACT CT G 433 

Qy 484 ACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAG 543 

M I I I I I I I I I I III II I I I I I II M I I I 

Db 434 ACGGT GAGAGAAAACTTACAGTT CTCAGCAGCT CTT CGGCTTGCAACAACTATGACGAAT 4 93 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I I I I I I I I I I I I I I I I I I I I II 

Db 494 CATGAAAAAAACGAACGGATTAACAGGGT CATT CAAGAGTTAGGTCT GGATAAAGTGGCA 553 

Qy 604 AACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTG 663 

| | I I I I I I I I I I I I I I I I I I I I I I M I I II I II 

Db 554 GACT C CAAG GT T GGAACT CAGT TTAT CCGT GGT GTGT CT GGAGGAGAAAGAAAAAGGACT 613 

Qy 664 AGCAT T GG GGT GCAGCT C CT GT G GAACCCAGGAAT C CT CAT T CT GGAT GAACCCACT T CT 723 

| I I I I I I M III I I I I I I I I I I" N H 

Db 614 AGTAT AGGAAT GGAGCTT AT C ACTGAT C CT T CCAT CTT GTT CT T GGAT GAGC CT ACAACT 673 

Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 783 

M | I I I I MINI I II I I I I I I I I I I I 



Db 


674 


GGCTTAGACTCAAGCACAGCAAATGCTGTCCTTTTGCTCCTGAAAAGGATGTClAA^CAb 


7 "3^ 


Qy 


784 


AACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGAC 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GGACGAACAATCATCTTCTCCATTCATCAGCCTCGATATTCCATCTTCAAGTTG1 1 1GA1 


8 4 3 


Db 


734 


/ J J 


Qy 


844 


CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 

|| | M 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCC1 CjLI UALrtjA^Cj^C 1 1 (a 




Db 


794 


ft S3 


Qy 


904 


CAGT ACTT CACAT C CATT GGC CAC C CTT GT CCT CGCT AT AGCAACC CT GCGGACT T CT AC 

| | | | | Ml Ml II III 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
GGATACTTTGAATCAGCTGGTTATCACTGTGAGGCCTATAATAACCCTGCAGACTTCTTC 


963 


Db 


854 


913 


Qy 


964 


GTGGACTTGA 973 

1 1 1 1 1 1 1 

TTGGACATCA 923 




Db 


914 





RESULT 12 
US-09-866-866A-26 

; Sequence 26 f Application US/09866866A 
; Patent No. US20020102244A1 
; GENERAL INFORMATION: 

APPLICANT: Sorrentino, Brian 
; APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
; FILE REFERENCE: 1340-1- 021CIP2 
; CURRENT APPLICATION NUMBER: US/ 09/ 866, 8 66A 
; CURRENT FILING DATE: 2001-08-30 
; PRIOR APPLICATION NUMBER: 09/584,586 
; PRIOR FILING DATE: 2000-05-31 
; PRIOR APPLICATION NUMBER: PCT/US99/ 11825 
; PRIOR FILING DATE: 1999-05-27 

PRIOR APPLICATION NUMBER: 60/086,988 
; PRIOR FILING DATE: 1998-05-28 
; NUMBER OF SEQ ID NOS : 27 
; SOFTWARE: Patentln version 3.0 
; SEQ ID NO 26 

LENGTH: 2247 
TYPE: DNA 
; ORGANISM: Homo sapiens 
US-09-866-866A-26 

Query Match 6.6%; Score 132.4; DB 9; Length 2247; 

Best Local Similarity 51.9%; Pred. No. 4.2e-30; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2 

Q y 304 GCCAT CATAGGGAGCT CAGGCTGCGGGAGAGC CT CACTACT CGACGT GATCACAGGCAGA 363 

I I I I M I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 422 GC C AT C CT GGGACCCAC AGGT GGAGGCAAAT CTT CGT TAT TAGAT G TCTTAGCTGCA 478 

Qy 364 GGC CAC G GT GGCAAGAT GAAAT CAGGACAAAT T T GGAT AAAT GGGCAACC C AGT AC GC CT 423 

I Ml I I I I I I I I II I I I I M I I I III I I I 

Db 47 9 AGGAAAGAT C CAAGT GGAT TAT CT GGAGAT GT T CT GAT AAAT GGAGCACC GC GAC CT GC C 538 



Qy 



424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 483 



II I 1 1 I I I I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 539 AAT T T CAAAT GTAATT C AGGT TACGT GGTACAAGATGAT GTTGT GATGGGCACT CT G 595 

Qy 4 84 ACCGT CAGAGAGACCCT GGCTT T CAT T GC C CAGAT GCGCCTGCC CAGGAC CTT CT C C CAG 543 

I I I I I I I I I I I I III II I II I I II II I I I 

Db 596 ACG GT GAGAGAAAACT T AC AGTT CT CAGCAGCT CT T C GGCT T GCAACAACT AT GAC GAAT 655 

Qy 54 4 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I I I I I I I I I I I I I I I I I I I I II 

Db 656 CAT GAAAAAAAC GAAC GGAT TAACAGGGT CAT T CAAGAGTT AGGT CT GGAT AAAGT GGC A 715 

Qy 604 AACAC C AGAGT G GGCAAC AC GT AT GTACGT GGGGT GT C CGGGGGT GAG C GC CGAC GAGT G 663 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 716 GACT C CAAG GT T GGAACT C AGT TT AT CC GT GGT GT GT CTGGAGGAGAAAGAAAAAGGACT 775 

Qy 664 AGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCT 723 

I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I II 

Db 77 6 AGT AT AG GAAT GGAGCT T AT CACT GAT C CTT CCAT CT T GTT CTT GGAT GAGCCT ACAACT 835 

Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 7 83 

I I I I I I I I I I I I I I II I IN I I I I I I I 

Db 836 GGCTTAGACT CAAGCACAGCAAAT GCTGT CCTTTT GCT CCT GAAAAGGAT GTCTAAGCAG 8 95 

Qy 784 AACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGAC 843 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 896 G GAC GAAC AAT CAT C T T CT C CAT T CAT CAG C C T C GAT AT T C CAT CTT C AAGT T GT T T GAT 955 

Qy 84 4 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

II I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 956 AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTG 1015 

Qy 904 CAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTAC 963 

I I I I I III III II III I I I I I I I I I I I I I I I I I I I I I 

Db 1016 GGAT ACTTT GAAT CAGCTGGTT AT CACT GT GAGGCCTATAATAACCCTGCAGACTT CTTC 1075 

Qy 964 GTGGACTTGA 973 

I I I I I I I 
Db 1076 TTGGACATCA 1085 



RESULT 13 
US-09-961-086-2 

; Sequence 2, Application US/09961086 
; Publication No. US20030036645A1 
; GENERAL INFORMATION: 

; APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
; APPLICANT: ROSS, Douglas D. 
; APPLICANT: DOYLE, L. Austin 
; APPLICANT: ABRUZZO, Lynne 

; TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 

; TITLE OF INVENTION: WHICH ENCODES IT 

; FILE REFERENCE: EP19376-019 

; CURRENT APPLICATION NUMBER: US/09/961, 086 

; CURRENT FILING DATE: 2001-09-21 

; PRIOR APPLICATION NUMBER: US 60/073,763 

; PRIOR FILING DATE: 1998-02-05 

; PRIOR APPLICATION NUMBER: PCT/US99/02577 



; PRIOR FILING DATE: 1999-02-05 
; NUMBER OF SEQ ID NOS : 7 
; SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 2 

LENGTH: 2418 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-09-961-086-2 



Query Match 6.6%; Score 132.4; DB 10; Length 2418; 

Best Local Similarity 51.9%; Pred. No. 4.4e-30; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2; 

Qy 304 GC CAT C AT AGGGAGCT CAGGCT GCGGGAGAGC CT CACT ACTC GACGT GAT CAC AGGCAGA 363 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I 
D b 467 G C CAT C CT GGGAC C CACAGGT GGAGGCAAAT CT T C GTT ATT AGAT G TCTTAGCTGCA 523 

Qy 364 GGC CAC GGT GGCAAGAT GAAAT CAGGACAAAT TT GGATAAAT GGGCAAC C CAGT ACGC CT 423 

I III I I I I I I I I II I I I I I I I I I Ml I I I 

Db 524 AGGAAAGAT C CAAGT GGATT AT CT GGAGATGT T CT GATAAAT GGAGCAC C GCGAC CT GC C 583 

Qy 424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 483 

| | | | | | II I I I I II I I I I I I I I M I I I 

Db 584 AAT TT CAAAT GT AAT T CAGGT TACGT GGTACAAGATGATGTTGT GAT GGGCACT CTG 640 

Qy 484 ACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAG 543 

I I I I I I I I I I I I Ml M I I I I I II II I I I 

Db 641 AC GGT GAGAGAAAACTT AC AGTT CT CAGCAGCT CT T CGGCT T GCAACAACT AT GAC GAAT 700 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I I I I I I I I I I I I II I I I I I I II 

Db 701 CAT GAAAAAAAC GAAC GGAT TAACAGGGT CAT T CAAGAGTT AGGT CT GGAT AAAGT GGCA 760 

Qy 604 AACAC CAGAGTGGGCAACAC GT AT GT AC GT GGGGT GT C C GGGGGT GAGC GCC GACGAGT G 663 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 7 61 GACT C CAAGGTT GGAACTCAGTTT AT CCGT GGTGT GT CT GGAGGAGAAAGAAAAAGGACT 820 

Qy 664 AGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCT 723 

II I I I I I I I I I I I IN I I I I I I I I I I M I I I I II 

Db 821 AGT AT AGGAAT GGAGCTT AT CACT GAT CCTT C CAT CT T GTT CT T GGAT GAGC CT ACAACT 88 0 



Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 783 

I I I I I II I I I I II I M I Ml I I I I I I I 

Db 8 81 GGCT T AGACT CAAGCACAGCAAAT GCT GT CCTTTTGCT C CT GAAAAGGAT GT CT AAGCAG 94 0 

Qy 784 AACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGAC 843 

| I || I I I I I I I I II II I I I I I I I I I I I I II I I I I I I I 

Db 941 GGAC GAACAAT CAT CT TCT C CATT CAT CAGC CT C GAT ATT CCAT CT T CAAGTT GTT T GAT 1000 

Qy 844 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

|| I II I I I I I I I III I II I I I I II I I I I II 

Db 1001 AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTG 1060 



Qy 904 CAGT ACT T CAC AT C CATT GGC C ACC CT T GT C CT C GCT ATAGCAAC CCT GCGGACT T CT AC 963 

I I I I I III M I I M I I I I I I I I I I I I I M I I I 

Db 1061 GGAT ACT T T GAAT CAGCT GGT TAT C ACTGT GAGGCCT AT AAT AAC CCT GCAGACT T CT T C 1120 



Qy 964 GTGGACTTGA 973 

I I I I I I I 
Db 1121 TTGGACATCA 1130 



RESULT 14 
US-09-981-353-34 

Sequence 34, Application US/09981353 
Patent No. US20020160382A1 
GENERAL INFORMATION : 
APPLICANT: Lasek, Amy W. 
APPLICANT: Jones, David A. 

TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
FILE REFERENCE: PA-0038 US 

CURRENT APPLICATION NUMBER: US/09/981, 353 
CURRENT FILING DATE: 2001-10-11 
NUMBER OF SEQ ID NOS : 194 
SOFTWARE: PERL Program 
SEQ ID NO 34 
LENGTH: 2574 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/ KEY: mis cofeature 

OTHER INFORMATION: Incyte ID No. US20020160382A1 5517972CB1 
US-09-981-353-34 

Query Match 6.6%; Score 132.4; DB 9; Length 2574; 

Best Local Similarity 51.9%; Pred. No. 4.5e-30; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2; 

Qy 304 GC CAT CATAGGGAGCT CAGGCT GC GGGAGAGC CT CACTACT CGACGT GAT CACAGGCAGA 363 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I 

Db 637 GCCATCCTGGGACCCACAGGTGGAGGCAAATCTTCGTTATTAGATG TCTTAGCTGCA 693 

Qy 364 GGCCACGGTGGC7VAGATGAAATCAGGACAAATTTGGAT7^AATGGGC7^ACCCAGTACGCCT 423 

* I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 694 AGGAAAGAT CCAAGTGGATTAT CT GGAGAT GTT CTGATAAATGGAGCACCGCGACCTGCC 753 

Qy 424 CAGCTGGT GAGGAAGTGCGTTGCGCATGTGCGGCAGCAT GACCAACT GCTGCCCAACCT G 4 83 

I I I I I I II I I II II I I I I I I II II Ml 

Db 754 AATTTCAAATGTAATTCAGGT TACGTGGTACAAGATGATGTTGTGATGGGCACTCTG 810 

Qy 484 ACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAG 543 

II II II I II I I I Ml II I II I I I I II I I I 

Db 811 ACGGTGAGAGAAAACTTACAGTT CT CAGCAGCTCTTCGGCTTGCAACAACTAT GACGAAT 870 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I I II II I I II II II I I I II I I II 

Db 871 CATGAAAAAAACGAACGGATTAACAGGGT CATT CAAGAGTT AGGT CT GGATAAAGT GGCA 930 

Qy 604 AACACCAGAGT GGGCAACAC GT AT GT ACGT GG GGT GT C C GGGGGT GAGCGC C GAC GAGT G 663 

I I I I I II I I I II I I II M I I M II I I II II I II 

Db 931 GACT CCAAGGTT GGAACT CAGT T TAT CCGT GGT GT GT CT GGAGGAGAAAGAAAAAGGACT 990 

Qy 664 AGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATG7^ACCCACTTCT 723 

MUM I I M II I III MM I I II II M II II M 



Db 


991 


AGTATAGGAATGGAGCTTATCACTGATCCTTCCATCTTGTTCTTGGA1 bAbLL 1 AtAAL 1 


i osn 

XUJU 


Qy 


724 


GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 

| I I 1 1 1 1 1 1 1 1 1 1 1 II 1 IN 1 M 1 1 II 
GGCT T AGACTCAAGCACAGCAAAT GCT GT C CT T T T GCT C CT GAAAAGGAT GT C rAAGUAG 




Db 


1051 


1 1 1 n 


Qy 


784 


AACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGAC 

| | 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 
GGACGAACAATCAT CTTCT CCATTCATCAGCCT CGATATT CCATCTT CAAGTT GTTT GAT 


O *i O 


Db 


1111 


1 1 7 n 
1 1 / u 


Qy 


844 


CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 

|| | I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 

AGCCTCACCTTATTGGCCTCAGGAAGAC1 1A1 1 L.L.Ai_.(j(j<jUL.i IjUI umj<j/\ijijL,v-± i 


yuo 


Db 


1171 


1230 


Qy 


904 


CAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTAC 

| | | | | Ml III II III 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 

GGAT ACT T T GAAT CAGCT GGTTAT C ACT GTGAGGC CTATAAT AAC C CT GCAGACT T CTT C 


963 


Db 


1231 


1290 


Qy 


964 


GTGGACTTGA 973 

1 1 1 1 1 1 1 
TTGGACATCA 1300 




Db 


1291 





RESULT 15 
US-10-120-687-60 

; Sequence 60, Application US/10120687 
; Publication No. US20030082155A1 
; GENERAL INFORMATION: 

APPLICANT: Massachusetts General Hospital 
; TITLE OF INVENTION: Stem Cells of the Islets of Langerhans and Their Use in 
Treating Diabetes 

; TITLE OF INVENTION: Mellitus 
; FILE REFERENCE: 3284/1235B 

; CURRENT APPLICATION NUMBER: US/10/120, 687 

; CURRENT FILING DATE: 2002-04-11 

; PRIOR APPLICATION NUMBER: US60/169082 

; PRIOR FILING DATE: 1999-12-06 

; PRIOR APPLICATION NUMBER: US 09/963,875 

; PRIOR FILING DATE: 2001-09-25 

; PRIOR APPLICATION NUMBER: US 60/215109 

; PRIOR FILING DATE: 2000-06-28 

; PRIOR APPLICATION NUMBER: US 60/238880 

; PRIOR FILING DATE: 2000-10-06 

; PRIOR APPLICATION NUMBER: US 09/731261 

; PRIOR FILING DATE: 2000-12-06 

; NUMBER OF SEQ ID NOS : 61 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 60 
; LENGTH: 2718 

TYPE : DNA 
; ORGANISM: Homo sapiens 
US-10-120-687-60 

Query Match 6.6%; Score 132.4; DB 14; Length 2718; 

Best Local Similarity 51.9%; Pred. No. 4.7e-30; 

Matches 348; Conservative 0; Mismatches 316; Indels 6; Gaps 2 



Qy 304 GCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGCAGA 363 

I I I I I I I I I I I I I I ( I I I I I I I I I I I I I II II I 
Db 433 GC CAT C CT GGGAC CCACAGGT GGAGGCAAAT CTT CGT TAT TAGAT G TCTTAGCTGCA 489 

Qy 364 GGCCACGGT GGCAAGAT GAAATCAGGACAAATTT GGATAAATGGGCAACCCAGTACGCCT 423 

I I I I I I I I I I I I II I I I I I I I I I III I I I 

Db 490 AGGAAAGATCCAAGTGGATTATCTGGAGATGTTCTGATAAATGGAGCACCGCGACCTGCC 54 9 

Qy 424 CAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTG 4 83 

I I I I I I II I II I II I I I I I I I I II III 

Db 550 AAT T T C AAAT GT AAT T C AG GT T ACGT GGTACAAGAT GATGTT GT GATGGGCACTCT G 606 

Qy 484 ACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAG 543 

I I I I I I I I I I I I I I I II I M II II II I I I 

Db 607 ACGGT GAGAGAAAACTTAC AGT TCT CAGCAGCT CT T CGG CTT GCAACAACTAT GAC GAAT 666 

Qy 544 GCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCC 603 

I I I II I I I I I II M I I I I I I I I I II 

Db 667 CAT GAAAAAAAC GAAC GGAT T AACAGGGT CAT T CAAGAGT TAGGT CT GGAT AAAGT GGC A 726 

Qy 604 AACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTG 663 

Mill II I I I I I I I I M I I I I I I I I I I I I I I II 

Db 727 GACT CCAAGGTT G GAACT C AGT T TAT CCGT GGT GT GT CT GGAGGAGAAAGAAAAAG GACT 786 

Qy 664 AGCATTGGGGTGCAGCTCCT GT GGAACCCAGGAAT CCT CATTCT GGATGAACCCACTT CT 723 

I I II II MIMI I I I I I II I I I I I I II I II II II 

Db 787 AGT AT AG GAAT GGAGCTTAT CACT GAT CCTT CC AT CTT GT T CTT GGAT GAGCCTACAACT 846 

Qy 724 GGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGC 7 83 

111 I 111 I I II I I I II I III I I I I I I I 

Db 847 GGCTTAGACT CAAGCACAGCAAAT GCT GT CCTTTT GCT CCT GAAAAGGAT GTCTAAGCAG 906 

Qy 784 AAC AGGC T GGT GC T C AT CT C C C T C CAC C AGC CT C GC T C T GACAT CT T C AGGCT ATT T GAC 843 

I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 

Db 907 GGACGAACAATCATCTTCTCCATTCATCAGCCTCGATATTCCATCTTCAAGTTGTTTGAT 966 

Qy 844 CTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTG 903 

II I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 967 AGCCTCACCTTATTGGCCTCAGGAAGACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTG 1026 

Qy 904 CAGT ACTT CACAT C CAT T GGC CAC C CT TGT CCT C GCT AT AGCAAC C CT GCGGACTT CT AC 963 

I I I II III III II III I I I I I I I I I I II I II I I I I I I 

Db 1027 GGATACTTTGAAT CAGCT GGTTAT CACTGT GAGGCCTATAATAACCCTGCAGACTTCTT C 1086 

Qy 964 GTGGACTTGA 973 

I I I I I I I 
Db 1087 TTGGACATCA 1096 



Search completed: February 27, 2004, 07:11:35 
Job time : 467.956 sees 



